Flurex animal, PFOS and SCFA data

1 INFO

This document contains the commands necessary to analyse experimental data obtained from Flurex (internal project name: R20-22). The project data contains: - Animal weight data including calculated weights per bw and normalized weight data in decimals: + body weight (bw) from day 0 to day 8, including bw gain from day 0 - 8 + liver and cecum weight from dissection on day 8

  • PFOS quantitative data:
    • total dosed PFOS per rat on day 4 and 8 respectively (mg)
    • blood day 4 and 8 in ug PFOS/mL serum including calculations on
      • total blood volume per animal based on an standard average of 64mL blood / kilogram in rats “Diehl et al. 2001”
      • total PFOS in blood volume (mg)
      • total PFOS detected in blood from total dosed per day (pct)
    • liver from dissection on day 8 in ug PFOS/g tissue including calculations on
      • total pfos in liver per rat (mg)
      • total PFOS detected in liver from total dosed on day 8 (pct)
  • Short-chain fatty acids quantification of 10 compounds in colonic water given in mM from day 8:
    • acetic acid, formic acid, propanoic acid, 2-methyl-propanoic acid, butanoic acid, 3-methyl-butanoic acid, pentanoic acid, 4-methyl-pentanoic acid, hexanoic acid and heptanoic acid

2 Setup

Following code loads packages, creates necessary folder and saves parameters for the following analyses.

knitr::opts_chunk$set(echo = TRUE)

# Load libraries
library(tidyverse)
library(phyloseq)
library(decontam)
library(pals)
library(ggpubr)
library(vegan)
library(phangorn)
library(kableExtra)
library(plotly)
library(rstatix)
library(forcats)
library(dplyr)
library(tidyr)
library(ggplot2)
library(ggbreak)

# Create used folders if missing
if (!file.exists("R_objects")) dir.create(file.path(getwd(), "R_objects"))
if (!file.exists("plots")) dir.create(file.path(getwd(), "plots"))
if (!file.exists("plots/animal_data")) dir.create(file.path(getwd(), "plots/animal_data"))
if (!file.exists("scripts")) dir.create(file.path(getwd(), "scripts"))

# Save params
saveRDS(params, file = "R_objects/animal_params.RDS")

3 LOAD DATA

Loading data from CSV-format and saves as Rdata-format.

params <- readRDS("R_objects/animal_params.RDS")
## Error in eval(expr, envir, enclos): cannot change value of locked binding for 'params'
# Load analysis data
dat <- read.csv(params$input, header = TRUE, sep = ";", dec = ",")

save(dat, file = "R_objects/animal_data.Rdata")

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

4 ANIMAL WEIGHT DATA

Animal weight data contains data from body weight through the entire study period with calculated body weight gain, and organ weights from cecum and liver.

4.1 Body weight gain

This section will prepare to perform the data analysis for body weight gain

4.1.1 Statistics

4.1.1.1 Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
dat.clean <- dat
#dat.clean <- dat %>% select_if(~ !any(is.na(.)))
#dat.clean <- subset(dat, !dat$rat_name %in% c("R01","R30"))

# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "bw_gain"
SUBJECT <- "rat_name"

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable     n  mean    sd
##   <chr>     <fct>    <dbl> <dbl> <dbl>
## 1 CTRL      bw_gain     12  17.1  2.26
## 2 PFOS      bw_gain     12  17.2  4.04
## 3 VAN       bw_gain     12  17.6  2.69
## 4 VAN+PFOS  bw_gain     12  17.5  2.65

4.1.1.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

#### Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 2 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL      R01             1 no    no     310   321.  325.  339.  350    354
## 2 PFOS      R30            18 yes   no     258.  262.  265.  271.  270.   274
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Data contains two outliers: sample from rat_name R01 and R30.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic p.value
##   <chr>                <dbl>   <dbl>
## 1 residuals(model)     0.951  0.0457

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     3    44     0.530 0.664
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that body weight gain data has two outliers, has equal variance and is normally distributed without the outliers according to Shapiro-Wilk test. Therefore we use a one-way ANOVA test with Tukey’s honest significance test.

4.1.1.3 ANOVA One-Way test

4.1.1.3.1 Perform test

If we had equality of variance we can now run a one-way ANOVA tests anova_test() (if we have equal variance) or a welch_anova_test() (if variance vary).

if(EQUAL.VAR) {
  res.aov <- dat.clean %>% anova_test(FORMULA)
  res.aov
} else {
  res.aov <- dat.clean %>% welch_anova_test(FORMULA)
  res.aov
}
## ANOVA Table (type II tests)
## 
##      Effect DFn DFd     F     p p<.05   ges
## 1 treatment   3  44 0.064 0.979       0.004
4.1.1.3.2 Perform posthoc test

A significant one-way ANOVA is generally followed up by Tukey post-hoc tests to perform multiple pairwise comparisons between groups. When running relaxed Welch one-way test, the Games-Howell post hoc test or pairwise t-tests (with no assumption of equal variances) can be used to compare all possible combinations of group differences.

if(EQUAL.VAR) {
  pwc <- dat.clean %>% tukey_hsd(FORMULA)
  pwc
} else {
  pwc <- dat.clean %>% games_howell_test(FORMULA)
  pwc
}
## # A tibble: 6 × 9
##   term   group1 group2 null.value estimate conf.low conf.high p.adj p.adj.signif
## * <chr>  <chr>  <chr>       <dbl>    <dbl>    <dbl>     <dbl> <dbl> <chr>       
## 1 treat… CTRL   PFOS            0   0.111     -3.15      3.37 1     ns          
## 2 treat… CTRL   VAN             0   0.463     -2.79      3.72 0.981 ns          
## 3 treat… CTRL   VAN+P…          0   0.374     -2.88      3.63 0.99  ns          
## 4 treat… PFOS   VAN             0   0.353     -2.90      3.61 0.991 ns          
## 5 treat… PFOS   VAN+P…          0   0.264     -2.99      3.52 0.996 ns          
## 6 treat… VAN    VAN+P…          0  -0.0893    -3.35      3.17 1     ns

4.1.2 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}

# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "Bodyweight gain",limits = c(5,25),breaks = seq(5,25,5), labels = function(x) paste0(x, "%")) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment")

p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE)
p

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

# Output plot
ggsave(filename = paste0("plots/animal_data/weight/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
ggsave(filename = paste0("plots/animal_data/weight/",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

Body weight gain from day 0 to day 8

4.2 Cecum weight (normalized)

This section will prepare to perform the data analysis for normalized cecum weight data

4.2.1 Statistics

4.2.1.1 Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
dat.clean <- subset(dat, !is.na(cecum_norm))

# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "cecum_norm"
SUBJECT <- "rat_name"

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable       n  mean    sd
##   <chr>     <fct>      <dbl> <dbl> <dbl>
## 1 CTRL      cecum_norm    11 1     0.148
## 2 PFOS      cecum_norm    12 0.944 0.164
## 3 VAN       cecum_norm    12 1.88  0.24 
## 4 VAN+PFOS  cecum_norm    11 2.07  0.201

4.2.1.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

#### Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 2 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 VAN       R15            27 no    yes    268.  277.  283.  290.  296.   300
## 2 VAN       R24            36 no    yes    281.  286.  294.  305.  309.   312
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Data contains two outliers.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic p.value
##   <chr>                <dbl>   <dbl>
## 1 residuals(model)     0.984   0.753

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     3    42     0.416 0.742
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that normalised cecum weight data has two outliers, is normally distribution and has equal variance. Therefore we use a one-way ANOVA test with Tukey’s honest significance test.

4.2.1.3 ANOVA One-Way test

4.2.1.3.1 Perform test

If we had equality of variance we can now run a one-way ANOVA tests anova_test() (if we have equal variance) or a welch_anova_test() (if variance vary).

if(EQUAL.VAR) {
  res.aov <- dat.clean %>% anova_test(FORMULA)
  res.aov
} else {
  res.aov <- dat.clean %>% welch_anova_test(FORMULA)
  res.aov
}
## ANOVA Table (type II tests)
## 
##      Effect DFn DFd       F        p p<.05   ges
## 1 treatment   3  42 106.226 1.21e-19     * 0.884
4.2.1.3.2 Perform posthoc test

A significant one-way ANOVA is generally followed up by Tukey post-hoc tests to perform multiple pairwise comparisons between groups. When running relaxed Welch one-way test, the Games-Howell post hoc test or pairwise t-tests (with no assumption of equal variances) can be used to compare all possible combinations of group differences.

if(EQUAL.VAR) {
  pwc <- dat.clean %>% tukey_hsd(FORMULA)
  pwc
} else {
  pwc <- dat.clean %>% games_howell_test(FORMULA)
  pwc
}
## # A tibble: 6 × 9
##   term      group1 group2   null.value estimate conf.low conf.high    p.adj
## * <chr>     <chr>  <chr>         <dbl>    <dbl>    <dbl>     <dbl>    <dbl>
## 1 treatment CTRL   PFOS              0  -0.0562  -0.271      0.158 8.96e- 1
## 2 treatment CTRL   VAN               0   0.884    0.669      1.10  1.42e-12
## 3 treatment CTRL   VAN+PFOS          0   1.07     0.850      1.29  1.06e-12
## 4 treatment PFOS   VAN               0   0.940    0.730      1.15  1.09e-12
## 5 treatment PFOS   VAN+PFOS          0   1.13     0.911      1.34  1.06e-12
## 6 treatment VAN    VAN+PFOS          0   0.186   -0.0287     0.400 1.1 e- 1
## # ℹ 1 more variable: p.adj.signif <chr>

4.2.2 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}

# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "% difference",limits = c(0.5,3.1),breaks = seq(0.5,3.1,0.5), labels = function(x) paste0(x*100, "%")) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment")

p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(2.2,2.8,2.5,3.1))
p

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/weight/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
ggsave(filename = paste0("plots/animal_data/weight/",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

Normalized cecum weights at day 8

4.3 Liver weight (normalized)

This section will prepare to perform the data analysis for normalized liver weight data

4.3.1 Statistics

4.3.1.1 Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Remove NA in the data column
dat.clean <- subset(dat, !is.na(liver_norm))

# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "liver_norm"
SUBJECT <- "rat_name"

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable       n  mean    sd
##   <chr>     <fct>      <dbl> <dbl> <dbl>
## 1 CTRL      liver_norm    12 1     0.055
## 2 PFOS      liver_norm    12 1.08  0.055
## 3 VAN       liver_norm    12 0.934 0.044
## 4 VAN+PFOS  liver_norm    12 1.03  0.065

4.3.1.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

#### Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
##  [1] treatment         rat_name          ordering          pfos             
##  [5] van               bw_0              bw_1              bw_2             
##  [9] bw_3              bw_4              bw_5              bw_6             
## [13] bw_7              bw_8              bw_gain           cecum_wt         
## [17] cecum_wt_bw       cecum_norm        liver_wt          liver_wt_bw      
## [21] liver_norm        tot_pfos4         blood_vol4_mL     pfos_serum4_ugml 
## [25] pfos_serum4_ug    pfos_serum4_mg    pfos_serum4_pct   tot_pfos8        
## [29] blood_vol8_mL     pfos_serum8_ugml  pfos_serum8_ug    pfos_serum8_mg   
## [33] pfos_serum8_pct   pfos_change48_pct pfos_liver_ugg    pfos_liver_mg    
## [37] pfos_liver_pct    acetic            formic            propanoic        
## [41] m2_propanoic      butanoic          m3_butanoic       pentanoic        
## [45] m4_pentanoic      hexanoic          heptanoic         is.outlier       
## [49] is.extreme       
## <0 rækker> (eller 0-længde row.names)

Data contains zero outliers.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic p.value
##   <chr>                <dbl>   <dbl>
## 1 residuals(model)     0.985   0.778

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     3    44     0.430 0.733
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that normalised liver weight data has two outliers, is normally distribution and has equal variance. Therefore we use a one-way ANOVA test with Tukey’s honest significance test.

4.3.1.3 ANOVA One-Way test

4.3.1.3.1 Perform test

If we had equality of variance we can now run a one-way ANOVA tests anova_test() (if we have equal variance) or a welch_anova_test() (if variance vary).

if(EQUAL.VAR) {
  res.aov <- dat.clean %>% anova_test(FORMULA)
  res.aov
} else {
  res.aov <- dat.clean %>% welch_anova_test(FORMULA)
  res.aov
}
## ANOVA Table (type II tests)
## 
##      Effect DFn DFd      F        p p<.05  ges
## 1 treatment   3  44 15.909 3.79e-07     * 0.52
4.3.1.3.2 Perform posthoc test

A significant one-way ANOVA is generally followed up by Tukey post-hoc tests to perform multiple pairwise comparisons between groups. When running relaxed Welch one-way test, the Games-Howell post hoc test or pairwise t-tests (with no assumption of equal variances) can be used to compare all possible combinations of group differences.

if(EQUAL.VAR) {
  pwc <- dat.clean %>% tukey_hsd(FORMULA)
  pwc
} else {
  pwc <- dat.clean %>% games_howell_test(FORMULA)
  pwc
}
## # A tibble: 6 × 9
##   term      group1 group2   null.value estimate conf.low conf.high       p.adj
## * <chr>     <chr>  <chr>         <dbl>    <dbl>    <dbl>     <dbl>       <dbl>
## 1 treatment CTRL   PFOS              0   0.0846   0.0246   0.145   0.0027     
## 2 treatment CTRL   VAN               0  -0.0663  -0.126   -0.00623 0.0254     
## 3 treatment CTRL   VAN+PFOS          0   0.0352  -0.0249   0.0953  0.409      
## 4 treatment PFOS   VAN               0  -0.151   -0.211   -0.0909  0.000000182
## 5 treatment PFOS   VAN+PFOS          0  -0.0494  -0.110    0.0107  0.14       
## 6 treatment VAN    VAN+PFOS          0   0.102    0.0415   0.162   0.000269   
## # ℹ 1 more variable: p.adj.signif <chr>

4.3.1.4 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}

# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "% difference",limits = c(0.75,1.5),breaks = seq(0.75,1.5,0.25), labels = function(x) paste0(x*100, "%")) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment")

p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(1.35,1.4,1.45,1.5))
p

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/weight/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
ggsave(filename = paste0("plots/animal_data/weight/",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

Normalized liver weights at day 8

5 PFOS QUANTITATIVE DATA

Following section handles data analysis of PFOS from serum and liver samples (Run on Dionex Ultimate 3000 / Bruker EVOQ Elite UPLC-MS/MS against linear PPOS standard curve and with internal MPFOS standard).

5.1 Blood serum day 4

This section will prepare to perform the data analysis for PFOS data from serum on day 4.

5.1.1 ug/mL in serum

5.1.1.1 Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Remove rows with NA
dat.clean <- subset(dat, !is.na(pfos_serum4_ugml))
#dat.clean <- dat %>% select_if(~ !any(is.na(.)))
#dat.clean <- subset(dat, !dat$rat_name %in% c("R01","R30"))

# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "pfos_serum4_ugml"
SUBJECT <- "rat_name"

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable             n   mean    sd
##   <chr>     <fct>            <dbl>  <dbl> <dbl>
## 1 CTRL      pfos_serum4_ugml    11  0     0.001
## 2 PFOS      pfos_serum4_ugml    12  9.17  2.01 
## 3 VAN       pfos_serum4_ugml    11  0.001 0.001
## 4 VAN+PFOS  pfos_serum4_ugml    10 10.0   1.18

5.1.1.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

#### Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 5 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL      R06             6 no    no     274.  276.  285.  296.  297.   302
## 2 CTRL      R11            11 no    no     271.  278.  287.  294.  293.   299
## 3 PFOS      R29            17 yes   no     239.  243.  248.  255.  259.   267
## 4 VAN       R13            25 no    yes    218.  222.  228.  234   231.   241
## 5 VAN       R21            33 no    yes    262.  268.  274.  285.  281.   291
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Data contains two outliers: sample from rat_name R01 and R30.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic    p.value
##   <chr>                <dbl>      <dbl>
## 1 residuals(model)     0.807 0.00000412

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic        p
##   <int> <int>     <dbl>    <dbl>
## 1     3    40      7.48 0.000436
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that body weight gain data has two outliers and has equal variance, however falls short on the Shapiro-Wilk test of normality and is therefore not normally distributed. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.

5.1.1.3 Kruskal-Wallis test

5.1.1.3.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov
## # A tibble: 1 × 6
##   .y.                  n statistic    df           p method        
## * <chr>            <int>     <dbl> <int>       <dbl> <chr>         
## 1 pfos_serum4_ugml    44      35.4     3 0.000000101 Kruskal-Wallis
5.1.1.3.0.2 Effect size

The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).

dat.clean %>% kruskal_effsize(FORMULA)
## # A tibble: 1 × 5
##   .y.                  n effsize method  magnitude
## * <chr>            <int>   <dbl> <chr>   <ord>    
## 1 pfos_serum4_ugml    44   0.809 eta2[H] large
5.1.1.3.0.3 Post-hoc test if interaction is significant

A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.

# pairwise comparisons
pwc <- dat.clean %>% 
  dunn_test(FORMULA, p.adjust.method = "fdr") 
pwc
## # A tibble: 6 × 9
##   .y.           group1 group2    n1    n2 statistic       p   p.adj p.adj.signif
## * <chr>         <chr>  <chr>  <int> <int>     <dbl>   <dbl>   <dbl> <chr>       
## 1 pfos_serum4_… CTRL   PFOS      11    12    3.86   1.15e-4 1.85e-4 ***         
## 2 pfos_serum4_… CTRL   VAN       11    11    0.0172 9.86e-1 9.86e-1 ns          
## 3 pfos_serum4_… CTRL   VAN+P…    11    10    4.53   5.87e-6 1.91e-5 ****        
## 4 pfos_serum4_… PFOS   VAN       12    11   -3.84   1.23e-4 1.85e-4 ***         
## 5 pfos_serum4_… PFOS   VAN+P…    12    10    0.863  3.88e-1 4.66e-1 ns          
## 6 pfos_serum4_… VAN    VAN+P…    11    10    4.51   6.36e-6 1.91e-5 ****

5.1.1.4 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}

# Create plot
p <- ggboxplot(dat, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "ug/mL",limits = c(0,20),breaks = seq(0,20,5)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment")

p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(14,17,15,14))
p
## Warning: Removed 4 rows containing non-finite values (`stat_boxplot()`).
## Warning: Removed 16 rows containing missing values (`geom_point()`).

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 4 rows containing non-finite values (`stat_boxplot()`).
## Removed 16 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 4 rows containing non-finite values (`stat_boxplot()`).
## Removed 16 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

PFOS conc. in ug/mL at day 4

5.1.2 Total mg in serum

5.1.2.1 Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pfos_serum4_mg"
SUBJECT <- "rat_name"

# Subset to a specific varible
dat.clean <- subset(dat, pfos == "yes")

# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(pfos_serum4_mg))

# Will yoou run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE

# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))

# Sort data for paired test
if (PAIRED) {
  # Order data
  dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
  
  # Remove unpaired samples
  dat.clean <- dat.clean %>% 
    group_by(!!sym(SUBJECT)) %>%
    filter(n() != 1) %>%
    arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
    droplevels() %>% 
    ungroup()
}

5.1.2.2 Assumptions and preliminary tests

The two-samples t-tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# identify outliers
dat.clean %>%
  group_by(!!sym(PREDICTOR)) %>%
  identify_outliers(!!sym(OUTCOME))
##  [1] treatment         rat_name          ordering          pfos             
##  [5] van               bw_0              bw_1              bw_2             
##  [9] bw_3              bw_4              bw_5              bw_6             
## [13] bw_7              bw_8              bw_gain           cecum_wt         
## [17] cecum_wt_bw       cecum_norm        liver_wt          liver_wt_bw      
## [21] liver_norm        tot_pfos4         blood_vol4_mL     pfos_serum4_ugml 
## [25] pfos_serum4_ug    pfos_serum4_mg    pfos_serum4_pct   tot_pfos8        
## [29] blood_vol8_mL     pfos_serum8_ugml  pfos_serum8_ug    pfos_serum8_mg   
## [33] pfos_serum8_pct   pfos_change48_pct pfos_liver_ugg    pfos_liver_mg    
## [37] pfos_liver_pct    acetic            formic            propanoic        
## [41] m2_propanoic      butanoic          m3_butanoic       pentanoic        
## [45] m4_pentanoic      hexanoic          heptanoic         is.outlier       
## [49] is.extreme       
## <0 rækker> (eller 0-længde row.names)

Any extreme outliers can be bad samples or errors in data entry. If outliers compare a test with and without the outlier to determine if it is important, or run a non-parametric Wilcoxon test.

Check normality by groups
The normality assumption can be checked by computing the Shapiro-Wilk test for each group. If the data is normally distributed, the p-value should be greater than 0.05. You can also create QQ plots for each group. QQ plot draws the correlation between a given data and the normal distribution.

If your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.

Consequently, we should not rely on only one approach for assessing the normality. A better strategy is to combine visual inspection and statistical test.

# Run Shapiro test
dat.clean %>% 
  group_by(!!sym(PREDICTOR)) %>%
  shapiro_test(!!sym(OUTCOME))
## # A tibble: 2 × 4
##   treatment variable       statistic     p
##   <chr>     <chr>              <dbl> <dbl>
## 1 PFOS      pfos_serum4_mg     0.955 0.717
## 2 VAN+PFOS  pfos_serum4_mg     0.930 0.446
# Create QQplot
ggqqplot(dat.clean, x = OUTCOME, facet.by = PREDICTOR)

If both Shapiro test has p > 0.05 and/ or the QQplot follows the reference line the data follows a normal distribution.

If the data does not follow the normal distribution run a Wilcoxon Rank-sum test

Check the equality of variances
This can be done using the Levene’s test. If the variances of groups are equal, the p-value should be greater than 0.05.

# Run test
dat.clean %>% levene_test(FORMULA)
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     1    20    0.0274 0.870
# Save output
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05

If the p-value of the Levene’s test is significant, it suggests that there is a significant difference between the variances of the two groups. In such case we should use Welch t-test, which doesn’t assume the equality of the two variances (var.equal=FALSE). If the Levene’s test is non-significant we can perform a Student t-test (var.equal=TRUE).

No outliers were identified. Data is normally distributed and has equal variance. Hence we use t-test.

5.1.2.3 PERFORM TEST

T-test
We are now ready to perform the test

stat.test <- dat.clean %>% 
  t_test(FORMULA,
         var.equal = EQUAL.VAR,
         detailed = TRUE,
         paired = FALSE,
         alternative = "two.sided") %>%
  add_significance()
stat.test
## # A tibble: 1 × 16
##   estimate estimate1 estimate2 .y.     group1 group2    n1    n2 statistic     p
##      <dbl>     <dbl>     <dbl> <chr>   <chr>  <chr>  <int> <int>     <dbl> <dbl>
## 1  -0.0171     0.165     0.183 pfos_s… PFOS   VAN+P…    12    10     -1.23 0.232
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## #   alternative <chr>, p.signif <chr>

The output provides:

  • .y.: the y variable used in the test.

  • group1,group2: the compared groups in the pairwise tests.

  • statistic: Test statistic used to compute the p-value.

  • df: degrees of freedom.

  • p: p-value.

  • p.adj: the adjusted p-value.

  • method: the statistical test used to compare groups.

  • p.signif, p.adj.signif: the significance level of p-values and adjusted p-values, respectively.

  • estimate: estimate of the effect size. It corresponds to the estimated mean or difference in means depending on whether it was a one-sample test or a two-sample test.

  • estimate1, estimate2: show the mean values of the two groups, respectively, for independent samples t-tests.

  • alternative: a character string describing the alternative hypothesis.

  • conf.low,conf.high: Lower and upper bound on a confidence interval.

Effect size
The effect size is calculated as Cohen’s D

dat.clean %>% cohens_d(FORMULA, 
                       var.equal = EQUAL.VAR,
                       paired = FALSE)
## # A tibble: 1 × 7
##   .y.            group1 group2   effsize    n1    n2 magnitude
## * <chr>          <chr>  <chr>      <dbl> <int> <int> <ord>    
## 1 pfos_serum4_mg PFOS   VAN+PFOS  -0.528    12    10 moderate

5.1.2.4 Create figure

# Prepare stats
stat.test <- stat.test %>% add_xy_position(x = PREDICTOR)

# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "mg PFOS",limits = c(0,0.30),breaks = seq(0,0.30,0.1)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment")

p <- p + stat_pvalue_manual(stat.test, tip.length = 0, hide.ns = FALSE, y.position = c(0.28))
p2 <- p + labs(subtitle = get_test_label(stat.test, detailed = TRUE))
p

# Plot for saving without legend
p3 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p3, device = "pdf", dpi = 300, units = "mm", width = 60, height = 100)

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

Total mg PFOS in blood volume

5.1.3 Pct.

Data for PFOS levels in serum at day 4 calculated from the total PFOS dosed at the time point. #### Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pfos_serum4_pct"
SUBJECT <- "rat_name"

# Subset to a specific varible
dat.clean <- subset(dat, pfos == "yes")

# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(pfos_serum4_pct))

# Will yoou run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE

# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))

# Sort data for paired test
if (PAIRED) {
  # Order data
  dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
  
  # Remove unpaired samples
  dat.clean <- dat.clean %>% 
    group_by(!!sym(SUBJECT)) %>%
    filter(n() != 1) %>%
    arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
    droplevels() %>% 
    ungroup()
}

5.1.3.1 Assumptions and preliminary tests

The two-samples t-tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# identify outliers
dat.clean %>%
  group_by(!!sym(PREDICTOR)) %>%
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 1 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 PFOS      R29            17 yes   no     239.  243.  248.  255.  259.   267
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Any extreme outliers can be bad samples or errors in data entry. If outliers compare a test with and without the outlier to determine if it is important, or run a non-parametric Wilcoxon test.

Check normality by groups
The normality assumption can be checked by computing the Shapiro-Wilk test for each group. If the data is normally distributed, the p-value should be greater than 0.05. You can also create QQ plots for each group. QQ plot draws the correlation between a given data and the normal distribution.

If your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.

Consequently, we should not rely on only one approach for assessing the normality. A better strategy is to combine visual inspection and statistical test.

# Run Shapiro test
dat.clean %>% 
  group_by(!!sym(PREDICTOR)) %>%
  shapiro_test(!!sym(OUTCOME))
## # A tibble: 2 × 4
##   treatment variable        statistic     p
##   <chr>     <chr>               <dbl> <dbl>
## 1 PFOS      pfos_serum4_pct     0.937 0.456
## 2 VAN+PFOS  pfos_serum4_pct     0.939 0.547
# Create QQplot
ggqqplot(dat.clean, x = OUTCOME, facet.by = PREDICTOR)

If both Shapiro test has p > 0.05 and/ or the QQplot follows the reference line the data follows a normal distribution.

If the data does not follow the normal distribution run a Wilcoxon Rank-sum test

Check the equality of variances
This can be done using the Levene’s test. If the variances of groups are equal, the p-value should be greater than 0.05.

# Run test
dat.clean %>% levene_test(FORMULA)
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     1    20      1.43 0.246
# Save output
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05

If the p-value of the Levene’s test is significant, it suggests that there is a significant difference between the variances of the two groups. In such case we should use Welch t-test, which doesn’t assume the equality of the two variances (var.equal=FALSE). If the Levene’s test is non-significant we can perform a Student t-test (var.equal=TRUE).

5.1.3.2 PERFORM TEST

T-test
We are now ready to perform the test

stat.test <- dat.clean %>% 
  t_test(FORMULA,
         var.equal = EQUAL.VAR,
         detailed = TRUE,
         paired = FALSE,
         alternative = "two.sided") %>%
  add_significance()
stat.test
## # A tibble: 1 × 16
##   estimate estimate1 estimate2 .y.     group1 group2    n1    n2 statistic     p
##      <dbl>     <dbl>     <dbl> <chr>   <chr>  <chr>  <int> <int>     <dbl> <dbl>
## 1   -0.626      6.75      7.37 pfos_s… PFOS   VAN+P…    12    10     -1.17 0.254
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## #   alternative <chr>, p.signif <chr>

The output provides:

  • .y.: the y variable used in the test.

  • group1,group2: the compared groups in the pairwise tests.

  • statistic: Test statistic used to compute the p-value.

  • df: degrees of freedom.

  • p: p-value.

  • p.adj: the adjusted p-value.

  • method: the statistical test used to compare groups.

  • p.signif, p.adj.signif: the significance level of p-values and adjusted p-values, respectively.

  • estimate: estimate of the effect size. It corresponds to the estimated mean or difference in means depending on whether it was a one-sample test or a two-sample test.

  • estimate1, estimate2: show the mean values of the two groups, respectively, for independent samples t-tests.

  • alternative: a character string describing the alternative hypothesis.

  • conf.low,conf.high: Lower and upper bound on a confidence interval.

Effect size
The effect size is calculated as Cohen’s D

dat.clean %>% cohens_d(FORMULA, 
                       var.equal = EQUAL.VAR,
                       paired = FALSE)
## # A tibble: 1 × 7
##   .y.             group1 group2   effsize    n1    n2 magnitude
## * <chr>           <chr>  <chr>      <dbl> <int> <int> <ord>    
## 1 pfos_serum4_pct PFOS   VAN+PFOS  -0.503    12    10 moderate

5.1.3.3 Create figure

# Prepare stats
stat.test <- stat.test %>% add_xy_position(x = PREDICTOR)

# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "% of total dosed PFOS", limits = c(3,10),breaks = seq(3,10,1)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment")

p <- p + stat_pvalue_manual(stat.test, tip.length = 0, hide.ns = FALSE) #, y.position = c(1.35,1.4,1.45,1.5))
p2 <- p + labs(subtitle = get_test_label(stat.test, detailed = TRUE))
p

# Plot for saving without legend
p3 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p3, device = "pdf", dpi = 300, units = "mm", width = 60, height = 100)

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

PFOS serum day 4 in pct. of total

5.2 Blood serum day 8

This section will prepare to perform the data analysis for PFOS data from serum on day 8.

5.2.1 ug/µL in serum

5.2.1.1 Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Remove rows with NA
dat.clean <- subset(dat, !is.na(pfos_serum8_ugml))
#dat.clean <- dat %>% select_if(~ !any(is.na(.)))
#dat.clean <- subset(dat, !dat$rat_name %in% c("R01","R30"))

# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "pfos_serum8_ugml"
SUBJECT <- "rat_name"

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable             n   mean     sd
##   <chr>     <fct>            <dbl>  <dbl>  <dbl>
## 1 CTRL      pfos_serum8_ugml    12  0.016  0.03 
## 2 PFOS      pfos_serum8_ugml    12 36.3   15.5  
## 3 VAN       pfos_serum8_ugml    12  0.011  0.021
## 4 VAN+PFOS  pfos_serum8_ugml    12 32.2   10.7

5.2.1.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

#### Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 7 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL      R05             5 no    no     214   219.  222.  229.  231.   237
## 2 CTRL      R09             9 no    no     273.  278.  290.  290.  296.   302
## 3 CTRL      R12            12 no    no     297.  304.  316.  322.  324.   334
## 4 VAN       R14            26 no    yes    246.  256.  260.  267.  270.   274
## 5 VAN       R16            28 no    yes    256.  260   268.  275.  273    279
## 6 VAN       R24            36 no    yes    281.  286.  294.  305.  309.   312
## 7 VAN+PFOS  R47            47 yes   yes    242.  249.  255.  263.  267.   271
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Data contains two outliers: sample from rat_name R01 and R30.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic    p.value
##   <chr>                <dbl>      <dbl>
## 1 residuals(model)     0.823 0.00000461

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic         p
##   <int> <int>     <dbl>     <dbl>
## 1     3    44      8.92 0.0000993
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that body weight gain data has two outliers and has equal variance, however falls short on the Shapiro-Wilk test of normality and is therefore not normally distributed. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.

5.2.1.3 Kruskal-Wallis test

5.2.1.3.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov
## # A tibble: 1 × 6
##   .y.                  n statistic    df            p method        
## * <chr>            <int>     <dbl> <int>        <dbl> <chr>         
## 1 pfos_serum8_ugml    48      37.3     3 0.0000000402 Kruskal-Wallis
5.2.1.3.0.2 Effect size

The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).

dat.clean %>% kruskal_effsize(FORMULA)
## # A tibble: 1 × 5
##   .y.                  n effsize method  magnitude
## * <chr>            <int>   <dbl> <chr>   <ord>    
## 1 pfos_serum8_ugml    48   0.779 eta2[H] large
5.2.1.3.0.3 Post-hoc test if interaction is significant

A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.

# pairwise comparisons
pwc <- dat.clean %>% 
  dunn_test(FORMULA, p.adjust.method = "fdr") 
pwc
## # A tibble: 6 × 9
##   .y.           group1 group2    n1    n2 statistic       p   p.adj p.adj.signif
## * <chr>         <chr>  <chr>  <int> <int>     <dbl>   <dbl>   <dbl> <chr>       
## 1 pfos_serum8_… CTRL   PFOS      12    12    4.37   1.26e-5 3.78e-5 ****        
## 2 pfos_serum8_… CTRL   VAN       12    12   -0.0899 9.28e-1 9.28e-1 ns          
## 3 pfos_serum8_… CTRL   VAN+P…    12    12    4.17   3.02e-5 4.53e-5 ****        
## 4 pfos_serum8_… PFOS   VAN       12    12   -4.46   8.32e-6 3.78e-5 ****        
## 5 pfos_serum8_… PFOS   VAN+P…    12    12   -0.195  8.46e-1 9.28e-1 ns          
## 6 pfos_serum8_… VAN    VAN+P…    12    12    4.26   2.03e-5 4.05e-5 ****

5.2.1.4 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}

# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "ug/mL",limits = c(0,80),breaks = seq(0,80,10)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment")

p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(72,80,75,72))
p
## Warning: Removed 11 rows containing missing values (`geom_point()`).

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 11 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 11 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

PFOS conc. in ug/mL at day 8

5.2.2 Total mg in serum

5.2.2.1 Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pfos_serum8_mg"
SUBJECT <- "rat_name"

# Subset to a specific varible
dat.clean <- subset(dat, pfos == "yes")

# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(pfos_serum8_mg))

# Will yoou run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE

# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))

# Sort data for paired test
if (PAIRED) {
  # Order data
  dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
  
  # Remove unpaired samples
  dat.clean <- dat.clean %>% 
    group_by(!!sym(SUBJECT)) %>%
    filter(n() != 1) %>%
    arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
    droplevels() %>% 
    ungroup()
}

5.2.2.2 Assumptions and preliminary tests

The two-samples t-tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# identify outliers
dat.clean %>%
  group_by(!!sym(PREDICTOR)) %>%
  identify_outliers(!!sym(OUTCOME))
##  [1] treatment         rat_name          ordering          pfos             
##  [5] van               bw_0              bw_1              bw_2             
##  [9] bw_3              bw_4              bw_5              bw_6             
## [13] bw_7              bw_8              bw_gain           cecum_wt         
## [17] cecum_wt_bw       cecum_norm        liver_wt          liver_wt_bw      
## [21] liver_norm        tot_pfos4         blood_vol4_mL     pfos_serum4_ugml 
## [25] pfos_serum4_ug    pfos_serum4_mg    pfos_serum4_pct   tot_pfos8        
## [29] blood_vol8_mL     pfos_serum8_ugml  pfos_serum8_ug    pfos_serum8_mg   
## [33] pfos_serum8_pct   pfos_change48_pct pfos_liver_ugg    pfos_liver_mg    
## [37] pfos_liver_pct    acetic            formic            propanoic        
## [41] m2_propanoic      butanoic          m3_butanoic       pentanoic        
## [45] m4_pentanoic      hexanoic          heptanoic         is.outlier       
## [49] is.extreme       
## <0 rækker> (eller 0-længde row.names)

Any extreme outliers can be bad samples or errors in data entry. If outliers compare a test with and without the outlier to determine if it is important, or run a non-parametric Wilcoxon test.

Check normality by groups
The normality assumption can be checked by computing the Shapiro-Wilk test for each group. If the data is normally distributed, the p-value should be greater than 0.05. You can also create QQ plots for each group. QQ plot draws the correlation between a given data and the normal distribution.

If your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.

Consequently, we should not rely on only one approach for assessing the normality. A better strategy is to combine visual inspection and statistical test.

# Run Shapiro test
dat.clean %>% 
  group_by(!!sym(PREDICTOR)) %>%
  shapiro_test(!!sym(OUTCOME))
## # A tibble: 2 × 4
##   treatment variable       statistic     p
##   <chr>     <chr>              <dbl> <dbl>
## 1 PFOS      pfos_serum8_mg     0.890 0.119
## 2 VAN+PFOS  pfos_serum8_mg     0.902 0.168
# Create QQplot
ggqqplot(dat.clean, x = OUTCOME, facet.by = PREDICTOR)

If both Shapiro test has p > 0.05 and/ or the QQplot follows the reference line the data follows a normal distribution.

If the data does not follow the normal distribution run a Wilcoxon Rank-sum test

Check the equality of variances
This can be done using the Levene’s test. If the variances of groups are equal, the p-value should be greater than 0.05.

# Run test
dat.clean %>% levene_test(FORMULA)
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     1    22      1.98 0.173
# Save output
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05

If the p-value of the Levene’s test is significant, it suggests that there is a significant difference between the variances of the two groups. In such case we should use Welch t-test, which doesn’t assume the equality of the two variances (var.equal=FALSE). If the Levene’s test is non-significant we can perform a Student t-test (var.equal=TRUE).

No outliers were identified. Data is normally distributed and has equal variance. Hence we use t-test.

5.2.2.3 PERFORM TEST

T-test
We are now ready to perform the test

stat.test <- dat.clean %>% 
  t_test(FORMULA,
         var.equal = EQUAL.VAR,
         detailed = TRUE,
         paired = FALSE,
         alternative = "two.sided") %>%
  add_significance()
stat.test
## # A tibble: 1 × 16
##   estimate estimate1 estimate2 .y.     group1 group2    n1    n2 statistic     p
##      <dbl>     <dbl>     <dbl> <chr>   <chr>  <chr>  <int> <int>     <dbl> <dbl>
## 1   0.0931     0.717     0.624 pfos_s… PFOS   VAN+P…    12    12     0.853 0.403
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## #   alternative <chr>, p.signif <chr>

Effect size
The effect size is calculated as Cohen’s D

dat.clean %>% cohens_d(FORMULA, 
                       var.equal = EQUAL.VAR,
                       paired = FALSE)
## # A tibble: 1 × 7
##   .y.            group1 group2   effsize    n1    n2 magnitude
## * <chr>          <chr>  <chr>      <dbl> <int> <int> <ord>    
## 1 pfos_serum8_mg PFOS   VAN+PFOS   0.348    12    12 small

5.2.2.4 Create figure

# Prepare stats
stat.test <- stat.test %>% add_xy_position(x = PREDICTOR)

# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "mg PFOS",limits = c(0,2),breaks = seq(0,2,0.5)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment")

p <- p + stat_pvalue_manual(stat.test, tip.length = 0, hide.ns = FALSE, y.position = c(1.75))
p2 <- p + labs(subtitle = get_test_label(stat.test, detailed = TRUE))
p2

# Plot for saving without legend
p3 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p3, device = "pdf", dpi = 300, units = "mm", width = 60, height = 100)

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

Total mg PFOS in blood volume

5.2.3 Pct.

5.2.3.1 Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pfos_serum8_pct"
SUBJECT <- "rat_name"

# Subset to a specific varible
dat.clean <- subset(dat, pfos == "yes" & !rat_name == "R47")

# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(pfos_serum8_pct))

# Will yoou run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE

# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))

# Sort data for paired test
if (PAIRED) {
  # Order data
  dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
  
  # Remove unpaired samples
  dat.clean <- dat.clean %>% 
    group_by(!!sym(SUBJECT)) %>%
    filter(n() != 1) %>%
    arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
    droplevels() %>% 
    ungroup()
}

5.2.3.2 Assumptions and preliminary tests

The two-samples t-tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# identify outliers
dat.clean %>%
  group_by(!!sym(PREDICTOR)) %>%
  identify_outliers(!!sym(OUTCOME))
##  [1] treatment         rat_name          ordering          pfos             
##  [5] van               bw_0              bw_1              bw_2             
##  [9] bw_3              bw_4              bw_5              bw_6             
## [13] bw_7              bw_8              bw_gain           cecum_wt         
## [17] cecum_wt_bw       cecum_norm        liver_wt          liver_wt_bw      
## [21] liver_norm        tot_pfos4         blood_vol4_mL     pfos_serum4_ugml 
## [25] pfos_serum4_ug    pfos_serum4_mg    pfos_serum4_pct   tot_pfos8        
## [29] blood_vol8_mL     pfos_serum8_ugml  pfos_serum8_ug    pfos_serum8_mg   
## [33] pfos_serum8_pct   pfos_change48_pct pfos_liver_ugg    pfos_liver_mg    
## [37] pfos_liver_pct    acetic            formic            propanoic        
## [41] m2_propanoic      butanoic          m3_butanoic       pentanoic        
## [45] m4_pentanoic      hexanoic          heptanoic         is.outlier       
## [49] is.extreme       
## <0 rækker> (eller 0-længde row.names)

Any extreme outliers can be bad samples or errors in data entry. If outliers compare a test with and without the outlier to determine if it is important, or run a non-parametric Wilcoxon test.

Check normality by groups
The normality assumption can be checked by computing the Shapiro-Wilk test for each group. If the data is normally distributed, the p-value should be greater than 0.05. You can also create QQ plots for each group. QQ plot draws the correlation between a given data and the normal distribution.

If your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.

Consequently, we should not rely on only one approach for assessing the normality. A better strategy is to combine visual inspection and statistical test.

# Run Shapiro test
dat.clean %>% 
  group_by(!!sym(PREDICTOR)) %>%
  shapiro_test(!!sym(OUTCOME))
## # A tibble: 2 × 4
##   treatment variable        statistic      p
##   <chr>     <chr>               <dbl>  <dbl>
## 1 PFOS      pfos_serum8_pct     0.867 0.0594
## 2 VAN+PFOS  pfos_serum8_pct     0.899 0.181
# Create QQplot
ggqqplot(dat.clean, x = OUTCOME, facet.by = PREDICTOR)

If both Shapiro test has p > 0.05 and/ or the QQplot follows the reference line the data follows a normal distribution.

If the data does not follow the normal distribution run a Wilcoxon Rank-sum test

Check the equality of variances
This can be done using the Levene’s test. If the variances of groups are equal, the p-value should be greater than 0.05.

# Run test
dat.clean %>% levene_test(FORMULA)
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     1    21      2.86 0.106
# Save output
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05

If the p-value of the Levene’s test is significant, it suggests that there is a significant difference between the variances of the two groups. In such case we should use Welch t-test, which doesn’t assume the equality of the two variances (var.equal=FALSE). If the Levene’s test is non-significant we can perform a Student t-test (var.equal=TRUE).

No outliers were identified. Data is normally distributed and has equal variance. Hence we use t-test.

5.2.3.3 PERFORM TEST

T-test
We are now ready to perform the test

stat.test <- dat.clean %>% 
  t_test(FORMULA,
         var.equal = EQUAL.VAR,
         detailed = TRUE,
         paired = FALSE,
         alternative = "two.sided") %>%
  add_significance()
stat.test
## # A tibble: 1 × 16
##   estimate estimate1 estimate2 .y.     group1 group2    n1    n2 statistic     p
##      <dbl>     <dbl>     <dbl> <chr>   <chr>  <chr>  <int> <int>     <dbl> <dbl>
## 1     2.05      11.8      9.74 pfos_s… PFOS   VAN+P…    12    11      1.29 0.212
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## #   alternative <chr>, p.signif <chr>

Effect size
The effect size is calculated as Cohen’s D

dat.clean %>% cohens_d(FORMULA, 
                       var.equal = EQUAL.VAR,
                       paired = FALSE)
## # A tibble: 1 × 7
##   .y.             group1 group2   effsize    n1    n2 magnitude
## * <chr>           <chr>  <chr>      <dbl> <int> <int> <ord>    
## 1 pfos_serum8_pct PFOS   VAN+PFOS   0.537    12    11 moderate

5.2.3.4 Create figure

# Prepare stats
stat.test <- stat.test %>% add_xy_position(x = PREDICTOR)

# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "% of total dosed PFOS", limits = c(5,25),breaks = seq(5,25,5)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment")

p <- p + stat_pvalue_manual(stat.test, tip.length = 0, hide.ns = FALSE, y.position = c(24))
p2 <- p + labs(subtitle = get_test_label(stat.test, detailed = TRUE))
p2

# Plot for saving without legend
p3 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p3, device = "pdf", dpi = 300, units = "mm", width = 60, height = 100)

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

PFOS serum day 8 in pct. of total

5.3 Blood serum day 4 and 8

This section will prepare to perform the data analysis for PFOS data from serum on day 4 and 8 collected.

5.3.1 Change from day 4 to 8 (Pct.)

5.3.1.1 Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pfos_change48_pct"
SUBJECT <- "rat_name"

# Subset to a specific varible
dat.clean <- subset(dat, pfos == "yes") # add following to subset() to remove the outliers: & !rat_name %in% c("R47","R27"))

# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(pfos_change48_pct))

# Will yoou run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE

# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))

# Sort data for paired test
if (PAIRED) {
  # Order data
  dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
  
  # Remove unpaired samples
  dat.clean <- dat.clean %>% 
    group_by(!!sym(SUBJECT)) %>%
    filter(n() != 1) %>%
    arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
    droplevels() %>% 
    ungroup()
}

5.3.1.2 Assumptions and preliminary tests

The two-samples t-tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# identify outliers
dat.clean %>%
  group_by(!!sym(PREDICTOR)) %>%
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 2 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 PFOS      R27            15 yes   no     270.  280.  284.  291.  290.   296
## 2 VAN+PFOS  R47            47 yes   yes    242.  249.  255.  263.  267.   271
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Any extreme outliers can be bad samples or errors in data entry. If outliers, compare a test with and without the outlier to determine if it is important, or run a non-parametric Wilcoxon test.

Check normality by groups
The normality assumption can be checked by computing the Shapiro-Wilk test for each group. If the data is normally distributed, the p-value should be greater than 0.05. You can also create QQ plots for each group. QQ plot draws the correlation between a given data and the normal distribution.

If your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.

Consequently, we should not rely on only one approach for assessing the normality. A better strategy is to combine visual inspection and statistical test.

# Run Shapiro test
dat.clean %>% 
  group_by(!!sym(PREDICTOR)) %>%
  shapiro_test(!!sym(OUTCOME))
## # A tibble: 2 × 4
##   treatment variable          statistic      p
##   <chr>     <chr>                 <dbl>  <dbl>
## 1 PFOS      pfos_change48_pct     0.894 0.132 
## 2 VAN+PFOS  pfos_change48_pct     0.805 0.0167
# Create QQplot
ggqqplot(dat.clean, x = OUTCOME, facet.by = PREDICTOR)

If both Shapiro test has p > 0.05 and/ or the QQplot follows the reference line the data follows a normal distribution.

If the data does not follow the normal distribution run a Wilcoxon Rank-sum test

Check the equality of variances
This can be done using the Levene’s test. If the variances of groups are equal, the p-value should be greater than 0.05.

# Run test
dat.clean %>% levene_test(FORMULA)
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     1    20     0.639 0.434
# Save output
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05

Two outliers were identified (sample for R27 and R47). Analysis result and test method is similar with and without outliers. Data is normally distributed and has equal variance. Hence we use t-test.

5.3.1.3 PERFORM TEST

T-test
We are now ready to perform the test

stat.test <- dat.clean %>% 
  t_test(FORMULA,
         var.equal = EQUAL.VAR,
         detailed = TRUE,
         paired = FALSE,
         alternative = "two.sided") %>%
  add_significance()
stat.test
## # A tibble: 1 × 16
##   estimate estimate1 estimate2 .y.     group1 group2    n1    n2 statistic     p
##      <dbl>     <dbl>     <dbl> <chr>   <chr>  <chr>  <int> <int>     <dbl> <dbl>
## 1     85.1      341.      256. pfos_c… PFOS   VAN+P…    12    10      1.11 0.281
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## #   alternative <chr>, p.signif <chr>

Effect size
The effect size is calculated as Cohen’s D

dat.clean %>% cohens_d(FORMULA, 
                       var.equal = EQUAL.VAR,
                       paired = FALSE)
## # A tibble: 1 × 7
##   .y.               group1 group2   effsize    n1    n2 magnitude
## * <chr>             <chr>  <chr>      <dbl> <int> <int> <ord>    
## 1 pfos_change48_pct PFOS   VAN+PFOS   0.475    12    10 small

5.3.1.4 Conclusion

5.3.1.5 Create figure

# Prepare stats
stat.test <- stat.test %>% add_xy_position(x = PREDICTOR)

# Create point plot with mean and SD
data_summary <- function(x) {
  m <- mean(x)
  ymin <- m-sd(x)
  ymax <- m+sd(x)
  return(c(y=m,ymin=ymin,ymax=ymax))
}
data_summary_collapsed <- function(x) {
  m <- mean(x)
  ymin <- m
  ymax <- m
  return(c(y=m,ymin=ymin,ymax=ymax))
}

p <- ggplot(dat.clean, aes(x = .data[[PREDICTOR]], y = .data[[OUTCOME]], color = .data[[PREDICTOR]])) +
  stat_summary(fun.data = data_summary_collapsed, geom = "crossbar", color = "black", width = 0.5, linewidth = 0.3) +
  stat_summary(fun.data = data_summary, geom = "errorbar", color = "black", width = 0.15, linewidth = 0.5) +
  geom_point(position = position_jitterdodge(dodge.width = 0.6, jitter.width = 0.4), size = 2, colour = "black", shape = 21, stroke = 0.5, aes(fill = treatment)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "% change", limits = c(100,900),breaks = seq(100,900,100), labels = function(x) paste0(x, "%")) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment") +
  theme_pubr()
p

# Alternative: Create boxplot
# p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
#           fill = PREDICTOR,
#           add =  "jitter",
#           add.params = list(size = 1)) +
#   scale_fill_manual(values = params$COL) +
#   scale_y_continuous(name = "% change", limits = c(100,900),breaks = seq(100,900,100)) +
#   labs(fill = "Treatment") +
#   scale_x_discrete(name = "Treatment")

p <- p + stat_pvalue_manual(stat.test, tip.length = 0, hide.ns = FALSE) #, y.position = c(1.35,1.4,1.45,1.5))
p2 <- p + labs(subtitle = get_test_label(stat.test, detailed = TRUE))
p2

p

# Plot for saving without legend
p3 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p3, device = "pdf", dpi = 300, units = "mm", width = 70, height = 100)

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

PFOS serum level change from day 4 to 8 in pct.

5.3.2 Data ug/mL

5.3.2.1 Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Color scheme
COL <- c("#61d46b","#ffe900","#31b44b","#efc000")

# Subset data
dat.sub <- subset(dat, pfos == "yes")

# Create data frame for data representation
dat.clean <- dat.sub %>% select(rat_name, treatment, pfos_serum4_ugml, pfos_serum8_ugml) %>%
  pivot_longer(., cols = c(pfos_serum4_ugml, pfos_serum8_ugml), names_to = "data_group", values_to = "conc")

# Create column for day of sampling
dat.clean <- transform(dat.clean, "day" = ifelse(dat.clean$data_group == "pfos_serum8_ugml","d8","d4"))

# Create ID column for easier handling
for (i in dat.sub$rat_name) {
  dat.clean$ID <- paste(dat.clean$day,"_",dat.clean$treatment)
}

# Order dataframe for analysis
dat.clean <- dat.clean[order(dat.clean$day),]

# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(conc))
dat.clean
##    rat_name treatment       data_group  conc day            ID
## 1       R25      PFOS pfos_serum4_ugml  8.56  d4     d4 _ PFOS
## 3       R26      PFOS pfos_serum4_ugml  9.60  d4     d4 _ PFOS
## 5       R27      PFOS pfos_serum4_ugml  7.92  d4     d4 _ PFOS
## 7       R28      PFOS pfos_serum4_ugml  8.64  d4     d4 _ PFOS
## 9       R29      PFOS pfos_serum4_ugml 12.96  d4     d4 _ PFOS
## 11      R30      PFOS pfos_serum4_ugml  8.68  d4     d4 _ PFOS
## 13      R31      PFOS pfos_serum4_ugml  5.72  d4     d4 _ PFOS
## 15      R32      PFOS pfos_serum4_ugml  7.56  d4     d4 _ PFOS
## 17      R33      PFOS pfos_serum4_ugml  8.24  d4     d4 _ PFOS
## 19      R34      PFOS pfos_serum4_ugml 11.36  d4     d4 _ PFOS
## 21      R35      PFOS pfos_serum4_ugml  9.00  d4     d4 _ PFOS
## 23      R36      PFOS pfos_serum4_ugml 11.84  d4     d4 _ PFOS
## 25      R37  VAN+PFOS pfos_serum4_ugml  9.36  d4 d4 _ VAN+PFOS
## 27      R38  VAN+PFOS pfos_serum4_ugml 11.76  d4 d4 _ VAN+PFOS
## 29      R39  VAN+PFOS pfos_serum4_ugml 10.64  d4 d4 _ VAN+PFOS
## 31      R40  VAN+PFOS pfos_serum4_ugml 12.12  d4 d4 _ VAN+PFOS
## 33      R41  VAN+PFOS pfos_serum4_ugml  9.12  d4 d4 _ VAN+PFOS
## 35      R42  VAN+PFOS pfos_serum4_ugml  9.80  d4 d4 _ VAN+PFOS
## 37      R43  VAN+PFOS pfos_serum4_ugml 10.08  d4 d4 _ VAN+PFOS
## 43      R46  VAN+PFOS pfos_serum4_ugml  9.88  d4 d4 _ VAN+PFOS
## 45      R47  VAN+PFOS pfos_serum4_ugml  8.80  d4 d4 _ VAN+PFOS
## 47      R48  VAN+PFOS pfos_serum4_ugml  8.60  d4 d4 _ VAN+PFOS
## 2       R25      PFOS pfos_serum8_ugml 43.92  d8     d8 _ PFOS
## 4       R26      PFOS pfos_serum8_ugml 45.60  d8     d8 _ PFOS
## 6       R27      PFOS pfos_serum8_ugml 69.92  d8     d8 _ PFOS
## 8       R28      PFOS pfos_serum8_ugml 21.92  d8     d8 _ PFOS
## 10      R29      PFOS pfos_serum8_ugml 26.56  d8     d8 _ PFOS
## 12      R30      PFOS pfos_serum8_ugml 47.84  d8     d8 _ PFOS
## 14      R31      PFOS pfos_serum8_ugml 21.84  d8     d8 _ PFOS
## 16      R32      PFOS pfos_serum8_ugml 29.36  d8     d8 _ PFOS
## 18      R33      PFOS pfos_serum8_ugml 22.08  d8     d8 _ PFOS
## 20      R34      PFOS pfos_serum8_ugml 52.48  d8     d8 _ PFOS
## 22      R35      PFOS pfos_serum8_ugml 29.84  d8     d8 _ PFOS
## 24      R36      PFOS pfos_serum8_ugml 23.76  d8     d8 _ PFOS
## 26      R37  VAN+PFOS pfos_serum8_ugml 36.08  d8 d8 _ VAN+PFOS
## 28      R38  VAN+PFOS pfos_serum8_ugml 31.28  d8 d8 _ VAN+PFOS
## 30      R39  VAN+PFOS pfos_serum8_ugml 25.92  d8 d8 _ VAN+PFOS
## 32      R40  VAN+PFOS pfos_serum8_ugml 23.92  d8 d8 _ VAN+PFOS
## 34      R41  VAN+PFOS pfos_serum8_ugml 21.68  d8 d8 _ VAN+PFOS
## 36      R42  VAN+PFOS pfos_serum8_ugml 40.96  d8 d8 _ VAN+PFOS
## 38      R43  VAN+PFOS pfos_serum8_ugml 25.60  d8 d8 _ VAN+PFOS
## 40      R44  VAN+PFOS pfos_serum8_ugml 37.44  d8 d8 _ VAN+PFOS
## 42      R45  VAN+PFOS pfos_serum8_ugml 25.36  d8 d8 _ VAN+PFOS
## 44      R46  VAN+PFOS pfos_serum8_ugml 34.72  d8 d8 _ VAN+PFOS
## 46      R47  VAN+PFOS pfos_serum8_ugml 59.52  d8 d8 _ VAN+PFOS
## 48      R48  VAN+PFOS pfos_serum8_ugml 23.76  d8 d8 _ VAN+PFOS
# Set names of variables
PREDICTOR <- "ID"
OUTCOME <- "conc"
SUBJECT <- "rat_name"

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   ID            variable     n  mean    sd
##   <chr>         <fct>    <dbl> <dbl> <dbl>
## 1 d4 _ PFOS     conc        12  9.17  2.01
## 2 d4 _ VAN+PFOS conc        10 10.0   1.18
## 3 d8 _ PFOS     conc        12 36.3  15.5 
## 4 d8 _ VAN+PFOS conc        12 32.2  10.7

5.3.2.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = COL)
bxp

5.3.2.3 Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 2 × 8
##   ID            rat_name treatment data_group   conc day   is.outlier is.extreme
##   <chr>         <chr>    <chr>     <chr>       <dbl> <chr> <lgl>      <lgl>     
## 1 d4 _ PFOS     R29      PFOS      pfos_serum…  13.0 d4    TRUE       FALSE     
## 2 d8 _ VAN+PFOS R47      VAN+PFOS  pfos_serum…  59.5 d8    TRUE       FALSE

Data contains two outliers: sample from rat_name R01 and R30.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic  p.value
##   <chr>                <dbl>    <dbl>
## 1 residuals(model)     0.875 0.000151

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic       p
##   <int> <int>     <dbl>   <dbl>
## 1     3    42      6.46 0.00108
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that PFOS concentrations from day 4 and 8 has two outliers, has unequal variance, and falls short on the Shapiro-Wilk test of normality (not normally distributed). Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.

5.3.2.4 Kruskal-Wallis test

5.3.2.4.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov
## # A tibble: 1 × 6
##   .y.       n statistic    df           p method        
## * <chr> <int>     <dbl> <int>       <dbl> <chr>         
## 1 conc     46      34.4     3 0.000000165 Kruskal-Wallis
5.3.2.4.0.2 Effect size

The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).

dat.clean %>% kruskal_effsize(FORMULA)
## # A tibble: 1 × 5
##   .y.       n effsize method  magnitude
## * <chr> <int>   <dbl> <chr>   <ord>    
## 1 conc     46   0.747 eta2[H] large
5.3.2.4.0.3 Post-hoc test if interaction is significant

A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.

# pairwise comparisons
pwc <- dat.clean %>% 
  dunn_test(FORMULA, p.adjust.method = "fdr") 
pwc
## # A tibble: 6 × 9
##   .y.   group1        group2     n1    n2 statistic       p   p.adj p.adj.signif
## * <chr> <chr>         <chr>   <int> <int>     <dbl>   <dbl>   <dbl> <chr>       
## 1 conc  d4 _ PFOS     d4 _ V…    12    10     0.798 4.25e-1 5.10e-1 ns          
## 2 conc  d4 _ PFOS     d8 _ P…    12    12     4.68  2.92e-6 1.75e-5 ****        
## 3 conc  d4 _ PFOS     d8 _ V…    12    12     4.48  7.51e-6 2.25e-5 ****        
## 4 conc  d4 _ VAN+PFOS d8 _ P…    10    12     3.66  2.51e-4 5.02e-4 ***         
## 5 conc  d4 _ VAN+PFOS d8 _ V…    10    12     3.47  5.15e-4 7.73e-4 ***         
## 6 conc  d8 _ PFOS     d8 _ V…    12    12    -0.198 8.43e-1 8.43e-1 ns

5.3.2.5 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}

# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = COL,labels = c("PFOS day 4","VAN+PFOS day 4","PFOS day 8","VAN+PFOS day 8")) +
  scale_y_continuous(name = "ug/mL",limits = c(0,85),breaks = seq(0,85,10)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment", labels = c("PFOS\nday 4","VAN+PFOS\nday 4","PFOS\nday 8","VAN+PFOS\nday 8"))

p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = FALSE, y.position = c(75,85,70,80))
p

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/pfos/pfos_day48_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/pfos_day48_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

PFOS conc. in ug/mL at day 8

5.3.3 Data mg

5.3.3.1 Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Color scheme
COL <- c("#61d46b","#ffe900","#31b44b","#efc000")

# Subset data
dat.sub <- subset(dat, pfos == "yes")

# Create data frame for data representation
dat.clean <- dat.sub %>% select(rat_name, treatment, pfos_serum4_mg, pfos_serum8_mg) %>%
  pivot_longer(., cols = c(pfos_serum4_mg, pfos_serum8_mg), names_to = "data_group", values_to = "mg")

# Create column for day of sampling
dat.clean <- transform(dat.clean, "day" = ifelse(dat.clean$data_group == "pfos_serum8_mg","d8","d4"))

# Create ID column for easier handling
for (i in dat.sub$rat_name) {
  dat.clean$ID <- paste(dat.clean$day,"_",dat.clean$treatment)
}

# Order dataframe for analysis
dat.clean <- dat.clean[order(dat.clean$day),]

# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(mg))
dat.clean
##    rat_name treatment     data_group        mg day            ID
## 1       R25      PFOS pfos_serum4_mg 0.1904292  d4     d4 _ PFOS
## 3       R26      PFOS pfos_serum4_mg 0.1592525  d4     d4 _ PFOS
## 5       R27      PFOS pfos_serum4_mg 0.1472486  d4     d4 _ PFOS
## 7       R28      PFOS pfos_serum4_mg 0.1480827  d4     d4 _ PFOS
## 9       R29      PFOS pfos_serum4_mg 0.2149079  d4     d4 _ PFOS
## 11      R30      PFOS pfos_serum4_mg 0.1498237  d4     d4 _ PFOS
## 13      R31      PFOS pfos_serum4_mg 0.1123500  d4     d4 _ PFOS
## 15      R32      PFOS pfos_serum4_mg 0.1359590  d4     d4 _ PFOS
## 17      R33      PFOS pfos_serum4_mg 0.1546747  d4     d4 _ PFOS
## 19      R34      PFOS pfos_serum4_mg 0.2121503  d4     d4 _ PFOS
## 21      R35      PFOS pfos_serum4_mg 0.1792512  d4     d4 _ PFOS
## 23      R36      PFOS pfos_serum4_mg 0.1811804  d4     d4 _ PFOS
## 25      R37  VAN+PFOS pfos_serum4_mg 0.1927112  d4 d4 _ VAN+PFOS
## 27      R38  VAN+PFOS pfos_serum4_mg 0.2344474  d4 d4 _ VAN+PFOS
## 29      R39  VAN+PFOS pfos_serum4_mg 0.1721467  d4 d4 _ VAN+PFOS
## 31      R40  VAN+PFOS pfos_serum4_mg 0.2381338  d4 d4 _ VAN+PFOS
## 33      R41  VAN+PFOS pfos_serum4_mg 0.1657651  d4 d4 _ VAN+PFOS
## 35      R42  VAN+PFOS pfos_serum4_mg 0.1652672  d4 d4 _ VAN+PFOS
## 37      R43  VAN+PFOS pfos_serum4_mg 0.2037289  d4 d4 _ VAN+PFOS
## 43      R46  VAN+PFOS pfos_serum4_mg 0.1702838  d4 d4 _ VAN+PFOS
## 45      R47  VAN+PFOS pfos_serum4_mg 0.1501491  d4 d4 _ VAN+PFOS
## 47      R48  VAN+PFOS pfos_serum4_mg 0.1332518  d4 d4 _ VAN+PFOS
## 2       R25      PFOS pfos_serum8_mg 1.0930000  d8     d8 _ PFOS
## 4       R26      PFOS pfos_serum8_mg 0.8230000  d8     d8 _ PFOS
## 6       R27      PFOS pfos_serum8_mg 1.3690000  d8     d8 _ PFOS
## 8       R28      PFOS pfos_serum8_mg 0.3980000  d8     d8 _ PFOS
## 10      R29      PFOS pfos_serum8_mg 0.4810000  d8     d8 _ PFOS
## 12      R30      PFOS pfos_serum8_mg 0.8450000  d8     d8 _ PFOS
## 14      R31      PFOS pfos_serum8_mg 0.4720000  d8     d8 _ PFOS
## 16      R32      PFOS pfos_serum8_mg 0.5750000  d8     d8 _ PFOS
## 18      R33      PFOS pfos_serum8_mg 0.4550000  d8     d8 _ PFOS
## 20      R34      PFOS pfos_serum8_mg 1.0550000  d8     d8 _ PFOS
## 22      R35      PFOS pfos_serum8_mg 0.6510000  d8     d8 _ PFOS
## 24      R36      PFOS pfos_serum8_mg 0.3890000  d8     d8 _ PFOS
## 26      R37  VAN+PFOS pfos_serum8_mg 0.7850000  d8 d8 _ VAN+PFOS
## 28      R38  VAN+PFOS pfos_serum8_mg 0.6670000  d8 d8 _ VAN+PFOS
## 30      R39  VAN+PFOS pfos_serum8_mg 0.4530000  d8 d8 _ VAN+PFOS
## 32      R40  VAN+PFOS pfos_serum8_mg 0.5010000  d8 d8 _ VAN+PFOS
## 34      R41  VAN+PFOS pfos_serum8_mg 0.4270000  d8 d8 _ VAN+PFOS
## 36      R42  VAN+PFOS pfos_serum8_mg 0.7600000  d8 d8 _ VAN+PFOS
## 38      R43  VAN+PFOS pfos_serum8_mg 0.5540000  d8 d8 _ VAN+PFOS
## 40      R44  VAN+PFOS pfos_serum8_mg 0.7520000  d8 d8 _ VAN+PFOS
## 42      R45  VAN+PFOS pfos_serum8_mg 0.4590000  d8 d8 _ VAN+PFOS
## 44      R46  VAN+PFOS pfos_serum8_mg 0.6440000  d8 d8 _ VAN+PFOS
## 46      R47  VAN+PFOS pfos_serum8_mg 1.0890000  d8 d8 _ VAN+PFOS
## 48      R48  VAN+PFOS pfos_serum8_mg 0.3980000  d8 d8 _ VAN+PFOS
# Set names of variables
PREDICTOR <- "ID"
OUTCOME <- "mg"
SUBJECT <- "rat_name"

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   ID            variable     n  mean    sd
##   <chr>         <fct>    <dbl> <dbl> <dbl>
## 1 d4 _ PFOS     mg          12 0.165 0.031
## 2 d4 _ VAN+PFOS mg          10 0.183 0.034
## 3 d8 _ PFOS     mg          12 0.717 0.32 
## 4 d8 _ VAN+PFOS mg          12 0.624 0.201

5.3.3.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = COL)
bxp

5.3.3.3 Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## [1] ID         rat_name   treatment  data_group mg         day        is.outlier
## [8] is.extreme
## <0 rækker> (eller 0-længde row.names)

Data contains two outliers: sample from rat_name R01 and R30.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic  p.value
##   <chr>                <dbl>    <dbl>
## 1 residuals(model)     0.896 0.000640

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic         p
##   <int> <int>     <dbl>     <dbl>
## 1     3    42      9.67 0.0000564
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that PFOS concentrations from day 4 and 8 has two outliers, has unequal variance, and falls short on the Shapiro-Wilk test of normality (not normally distributed). Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.

5.3.3.4 Kruskal-Wallis test

5.3.3.4.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov
## # A tibble: 1 × 6
##   .y.       n statistic    df           p method        
## * <chr> <int>     <dbl> <int>       <dbl> <chr>         
## 1 mg       46      34.1     3 0.000000187 Kruskal-Wallis
5.3.3.4.0.2 Effect size

The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).

dat.clean %>% kruskal_effsize(FORMULA)
## # A tibble: 1 × 5
##   .y.       n effsize method  magnitude
## * <chr> <int>   <dbl> <chr>   <ord>    
## 1 mg       46   0.741 eta2[H] large
5.3.3.4.0.3 Post-hoc test if interaction is significant

A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.

# pairwise comparisons
pwc <- dat.clean %>% 
  dunn_test(FORMULA, p.adjust.method = "fdr") 
pwc
## # A tibble: 6 × 9
##   .y.   group1        group2     n1    n2 statistic       p   p.adj p.adj.signif
## * <chr> <chr>         <chr>   <int> <int>     <dbl>   <dbl>   <dbl> <chr>       
## 1 mg    d4 _ PFOS     d4 _ V…    12    10     0.574 5.66e-1 6.79e-1 ns          
## 2 mg    d4 _ PFOS     d8 _ P…    12    12     4.62  3.92e-6 2.35e-5 ****        
## 3 mg    d4 _ PFOS     d8 _ V…    12    12     4.33  1.51e-5 4.54e-5 ****        
## 4 mg    d4 _ VAN+PFOS d8 _ P…    10    12     3.83  1.30e-4 2.60e-4 ***         
## 5 mg    d4 _ VAN+PFOS d8 _ V…    10    12     3.55  3.84e-4 5.75e-4 ***         
## 6 mg    d8 _ PFOS     d8 _ V…    12    12    -0.289 7.73e-1 7.73e-1 ns

5.3.3.5 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}

# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = COL,labels = c("PFOS day 4","VAN+PFOS day 4","PFOS day 8","VAN+PFOS day 8")) +
  scale_y_continuous(name = "mg",limits = c(0,1.75),breaks = seq(0,1.75,0.5)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment", labels = c("PFOS\nday 4","VAN+PFOS\nday 4","PFOS\nday 8","VAN+PFOS\nday 8")) +
  theme(axis.title.x = element_blank())

p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = FALSE, y.position = c(1.635,1.75,1.4,1.52))
p

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/pfos/pfos_day48_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/pfos_day48_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

PFOS amount in mg at day 4 and 8

5.4 Liver day 8

This section will prepare to perform the data analysis for PFOS data from liver on day 8.

5.4.1 ug/g in liver tissue

5.4.1.1 Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Remove rows with NA
dat.clean <- subset(dat, !is.na(pfos_liver_ugg))
#dat.clean <- dat %>% select_if(~ !any(is.na(.)))
#dat.clean <- subset(dat, !dat$rat_name %in% c("R01","R30"))

# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "pfos_liver_ugg"
SUBJECT <- "rat_name"

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable           n    mean     sd
##   <chr>     <fct>          <dbl>   <dbl>  <dbl>
## 1 CTRL      pfos_liver_ugg    12   0.199  0.173
## 2 PFOS      pfos_liver_ugg    12 176.    21.7  
## 3 VAN       pfos_liver_ugg    12   0.205  0.22 
## 4 VAN+PFOS  pfos_liver_ugg    12 196.    18.5

5.4.1.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

#### Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 2 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL      R11            11 no    no     271.  278.  287.  294.  293.   299
## 2 VAN       R16            28 no    yes    256.  260   268.  275.  273    279
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Data contains two outliers: sample from rat_name R01 and R30.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic  p.value
##   <chr>                <dbl>    <dbl>
## 1 residuals(model)     0.877 0.000119

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic          p
##   <int> <int>     <dbl>      <dbl>
## 1     3    44      12.2 0.00000631
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that body weight gain data has two outliers and has equal variance, however falls short on the Shapiro-Wilk test of normality and is therefore not normally distributed. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.

5.4.1.3 Kruskal-Wallis test

5.4.1.3.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov
## # A tibble: 1 × 6
##   .y.                n statistic    df            p method        
## * <chr>          <int>     <dbl> <int>        <dbl> <chr>         
## 1 pfos_liver_ugg    48      36.4     3 0.0000000608 Kruskal-Wallis
5.4.1.3.0.2 Effect size

The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).

dat.clean %>% kruskal_effsize(FORMULA)
## # A tibble: 1 × 5
##   .y.                n effsize method  magnitude
## * <chr>          <int>   <dbl> <chr>   <ord>    
## 1 pfos_liver_ugg    48   0.760 eta2[H] large
5.4.1.3.0.3 Post-hoc test if interaction is significant

A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.

# pairwise comparisons
pwc <- dat.clean %>% 
  dunn_test(FORMULA, p.adjust.method = "fdr") 
pwc
## # A tibble: 6 × 9
##   .y.           group1 group2    n1    n2 statistic       p   p.adj p.adj.signif
## * <chr>         <chr>  <chr>  <int> <int>     <dbl>   <dbl>   <dbl> <chr>       
## 1 pfos_liver_u… CTRL   PFOS      12    12     3.62  2.98e-4 4.47e-4 ***         
## 2 pfos_liver_u… CTRL   VAN       12    12    -0.102 9.19e-1 9.19e-1 ns          
## 3 pfos_liver_u… CTRL   VAN+P…    12    12     4.68  2.85e-6 8.54e-6 ****        
## 4 pfos_liver_u… PFOS   VAN       12    12    -3.72  2.00e-4 4.00e-4 ***         
## 5 pfos_liver_u… PFOS   VAN+P…    12    12     1.06  2.87e-1 3.44e-1 ns          
## 6 pfos_liver_u… VAN    VAN+P…    12    12     4.78  1.72e-6 8.54e-6 ****

5.4.1.4 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}

# Create plot
p <- ggboxplot(dat, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "ug/g",limits = c(0,300),breaks = seq(0,300,50)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment")

p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(240,280,260,240))
p
## Warning: Removed 1 rows containing missing values (`geom_point()`).

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 1 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 1 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot_legend.pdf"), p, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 1 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

PFOS conc. in ug/g at day 8 ### mg in liver tissue in all groups #### Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Remove rows with NA
dat.clean <- subset(dat, !is.na(pfos_liver_mg))
#dat.clean <- dat %>% select_if(~ !any(is.na(.)))
#dat.clean <- subset(dat, !dat$rat_name %in% c("R01","R30"))

# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "pfos_liver_mg"
SUBJECT <- "rat_name"

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable          n  mean    sd
##   <chr>     <fct>         <dbl> <dbl> <dbl>
## 1 CTRL      pfos_liver_mg    12 0.002 0.002
## 2 PFOS      pfos_liver_mg    12 2.11  0.305
## 3 VAN       pfos_liver_mg    12 0.002 0.002
## 4 VAN+PFOS  pfos_liver_mg    12 2.22  0.341

5.4.1.5 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

#### Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 1 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 VAN       R16            28 no    yes    256.   260  268.  275.   273   279
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Data contains one not extreme outliers (R16).

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic   p.value
##   <chr>                <dbl>     <dbl>
## 1 residuals(model)     0.861 0.0000433

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic           p
##   <int> <int>     <dbl>       <dbl>
## 1     3    44      16.5 0.000000257
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that mg PFOS in liver in all groups has one outlier, unequal variance, and falls short on the Shapiro-Wilk test of normality and is therefore not normally distributed. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.

5.4.1.6 Kruskal-Wallis test

5.4.1.6.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov
## # A tibble: 1 × 6
##   .y.               n statistic    df            p method        
## * <chr>         <int>     <dbl> <int>        <dbl> <chr>         
## 1 pfos_liver_mg    48      35.6     3 0.0000000927 Kruskal-Wallis
5.4.1.6.0.2 Effect size

The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).

dat.clean %>% kruskal_effsize(FORMULA)
## # A tibble: 1 × 5
##   .y.               n effsize method  magnitude
## * <chr>         <int>   <dbl> <chr>   <ord>    
## 1 pfos_liver_mg    48   0.740 eta2[H] large
5.4.1.6.0.3 Post-hoc test if interaction is significant

A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.

# pairwise comparisons
pwc <- dat.clean %>% 
  dunn_test(FORMULA, p.adjust.method = "fdr") 
pwc
## # A tibble: 6 × 9
##   .y.           group1 group2    n1    n2 statistic       p   p.adj p.adj.signif
## * <chr>         <chr>  <chr>  <int> <int>     <dbl>   <dbl>   <dbl> <chr>       
## 1 pfos_liver_mg CTRL   PFOS      12    12     3.86  1.12e-4 1.67e-4 ***         
## 2 pfos_liver_mg CTRL   VAN       12    12    -0.146 8.84e-1 8.84e-1 ns          
## 3 pfos_liver_mg CTRL   VAN+P…    12    12     4.39  1.14e-5 3.42e-5 ****        
## 4 pfos_liver_mg PFOS   VAN       12    12    -4.01  6.08e-5 1.22e-4 ***         
## 5 pfos_liver_mg PFOS   VAN+P…    12    12     0.525 6.00e-1 7.20e-1 ns          
## 6 pfos_liver_mg VAN    VAN+P…    12    12     4.53  5.77e-6 3.42e-5 ****

5.4.1.7 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}

# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "mg PFOS",limits = c(0,3.5),breaks = seq(0,3.5,0.01)) +
  scale_y_break(breaks = c(0.01,1), scales = 3, ticklabels = c(1.0,2.0,3.0), space = 0.3) +
  labs(fill = "Treatment") +
  theme(axis.title.x = element_blank(),
        axis.line.y.right = element_blank(),
        axis.text.y.right = element_blank(),
        axis.ticks.y.right = element_blank()) +
  scale_x_discrete(name = "Treatment")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(3.2,3,3.4,3.2))
p

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_all_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Removed 1 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_all_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Removed 1 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_all_plot_legend.pdf"), p, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Removed 1 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

PFOS conc. in ug/g at day 8

5.4.2 Total mg in liver

5.4.2.1 Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pfos_liver_mg"
SUBJECT <- "rat_name"

# Subset to a specific varible
dat.clean <- subset(dat, pfos == "yes")

# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(pfos_liver_mg))

# Will yoou run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE

# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))

# Sort data for paired test
if (PAIRED) {
  # Order data
  dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
  
  # Remove unpaired samples
  dat.clean <- dat.clean %>% 
    group_by(!!sym(SUBJECT)) %>%
    filter(n() != 1) %>%
    arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
    droplevels() %>% 
    ungroup()
}

5.4.2.2 Assumptions and preliminary tests

The two-samples t-tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# identify outliers
dat.clean %>%
  group_by(!!sym(PREDICTOR)) %>%
  identify_outliers(!!sym(OUTCOME))
##  [1] treatment         rat_name          ordering          pfos             
##  [5] van               bw_0              bw_1              bw_2             
##  [9] bw_3              bw_4              bw_5              bw_6             
## [13] bw_7              bw_8              bw_gain           cecum_wt         
## [17] cecum_wt_bw       cecum_norm        liver_wt          liver_wt_bw      
## [21] liver_norm        tot_pfos4         blood_vol4_mL     pfos_serum4_ugml 
## [25] pfos_serum4_ug    pfos_serum4_mg    pfos_serum4_pct   tot_pfos8        
## [29] blood_vol8_mL     pfos_serum8_ugml  pfos_serum8_ug    pfos_serum8_mg   
## [33] pfos_serum8_pct   pfos_change48_pct pfos_liver_ugg    pfos_liver_mg    
## [37] pfos_liver_pct    acetic            formic            propanoic        
## [41] m2_propanoic      butanoic          m3_butanoic       pentanoic        
## [45] m4_pentanoic      hexanoic          heptanoic         is.outlier       
## [49] is.extreme       
## <0 rækker> (eller 0-længde row.names)

Any extreme outliers can be bad samples or errors in data entry. If outliers compare a test with and without the outlier to determine if it is important, or run a non-parametric Wilcoxon test.

Check normality by groups
The normality assumption can be checked by computing the Shapiro-Wilk test for each group. If the data is normally distributed, the p-value should be greater than 0.05. You can also create QQ plots for each group. QQ plot draws the correlation between a given data and the normal distribution.

If your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.

Consequently, we should not rely on only one approach for assessing the normality. A better strategy is to combine visual inspection and statistical test.

# Run Shapiro test
dat.clean %>% 
  group_by(!!sym(PREDICTOR)) %>%
  shapiro_test(!!sym(OUTCOME))
## # A tibble: 2 × 4
##   treatment variable      statistic     p
##   <chr>     <chr>             <dbl> <dbl>
## 1 PFOS      pfos_liver_mg     0.944 0.546
## 2 VAN+PFOS  pfos_liver_mg     0.955 0.711
# Create QQplot
ggqqplot(dat.clean, x = OUTCOME, facet.by = PREDICTOR)

If both Shapiro test has p > 0.05 and/ or the QQplot follows the reference line the data follows a normal distribution.

If the data does not follow the normal distribution run a Wilcoxon Rank-sum test

Check the equality of variances
This can be done using the Levene’s test. If the variances of groups are equal, the p-value should be greater than 0.05.

# Run test
dat.clean %>% levene_test(FORMULA)
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     1    22     0.142 0.710
# Save output
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05

No outliers were identified. Data is normally distributed and has equal variance. Hence we use t-test.

5.4.2.3 PERFORM TEST

T-test
We are now ready to perform the test

stat.test <- dat.clean %>% 
  t_test(FORMULA,
         var.equal = EQUAL.VAR,
         detailed = TRUE,
         paired = FALSE,
         alternative = "two.sided") %>%
  add_significance()
stat.test
## # A tibble: 1 × 16
##   estimate estimate1 estimate2 .y.     group1 group2    n1    n2 statistic     p
##      <dbl>     <dbl>     <dbl> <chr>   <chr>  <chr>  <int> <int>     <dbl> <dbl>
## 1   -0.115      2.11      2.22 pfos_l… PFOS   VAN+P…    12    12    -0.870 0.394
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## #   alternative <chr>, p.signif <chr>

Effect size
The effect size is calculated as Cohen’s D

dat.clean %>% cohens_d(FORMULA, 
                       var.equal = EQUAL.VAR,
                       paired = FALSE)
## # A tibble: 1 × 7
##   .y.           group1 group2   effsize    n1    n2 magnitude
## * <chr>         <chr>  <chr>      <dbl> <int> <int> <ord>    
## 1 pfos_liver_mg PFOS   VAN+PFOS  -0.355    12    12 small

5.4.2.4 Create figure

# Prepare stats
stat.test <- stat.test %>% add_xy_position(x = PREDICTOR)

# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "mg PFOS",limits = c(0,3),breaks = seq(0,3,0.5)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment")

p <- p + stat_pvalue_manual(stat.test, tip.length = 0, hide.ns = FALSE, y.position = c(3))
p2 <- p + labs(subtitle = get_test_label(stat.test, detailed = TRUE))
p2

# Plot for saving without legend
p3 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p3, device = "pdf", dpi = 300, units = "mm", width = 60, height = 100)

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

Total mg PFOS in total liver

5.4.3 Pct.

5.4.3.1 Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pfos_liver_pct"
SUBJECT <- "rat_name"

# Subset to a specific varible
dat.clean <- subset(dat, pfos == "yes")

# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(pfos_liver_pct))

# Will yoou run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE

# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))

# Sort data for paired test
if (PAIRED) {
  # Order data
  dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
  
  # Remove unpaired samples
  dat.clean <- dat.clean %>% 
    group_by(!!sym(SUBJECT)) %>%
    filter(n() != 1) %>%
    arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
    droplevels() %>% 
    ungroup()
}

5.4.3.2 Assumptions and preliminary tests

The two-samples t-tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# identify outliers
dat.clean %>%
  group_by(!!sym(PREDICTOR)) %>%
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 2 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 PFOS      R26            14 yes   no     238.  246.  248   257   259.   265
## 2 PFOS      R27            15 yes   no     270.  280.  284.  291.  290.   296
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Any extreme outliers can be bad samples or errors in data entry. If outliers compare a test with and without the outlier to determine if it is important, or run a non-parametric Wilcoxon test.

Check normality by groups
The normality assumption can be checked by computing the Shapiro-Wilk test for each group. If the data is normally distributed, the p-value should be greater than 0.05. You can also create QQ plots for each group. QQ plot draws the correlation between a given data and the normal distribution.

If your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.

Consequently, we should not rely on only one approach for assessing the normality. A better strategy is to combine visual inspection and statistical test.

# Run Shapiro test
dat.clean %>% 
  group_by(!!sym(PREDICTOR)) %>%
  shapiro_test(!!sym(OUTCOME))
## # A tibble: 2 × 4
##   treatment variable       statistic     p
##   <chr>     <chr>              <dbl> <dbl>
## 1 PFOS      pfos_liver_pct     0.985 0.997
## 2 VAN+PFOS  pfos_liver_pct     0.937 0.456
# Create QQplot
ggqqplot(dat.clean, x = OUTCOME, facet.by = PREDICTOR)

If both Shapiro test has p > 0.05 and/ or the QQplot follows the reference line the data follows a normal distribution.

If the data does not follow the normal distribution run a Wilcoxon Rank-sum test

Check the equality of variances
This can be done using the Levene’s test. If the variances of groups are equal, the p-value should be greater than 0.05.

# Run test
dat.clean %>% levene_test(FORMULA)
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     1    22   0.00119 0.973
# Save output
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05

If the p-value of the Levene’s test is significant, it suggests that there is a significant difference between the variances of the two groups. In such case we should use Welch t-test, which doesn’t assume the equality of the two variances (var.equal=FALSE). If the Levene’s test is non-significant we can perform a Student t-test (var.equal=TRUE).

Two outliers were identified but analysis does not differ in result when excluded. Data is normally distributed and has equal variance. Hence we use t-test.

5.4.3.3 PERFORM TEST

T-test
We are now ready to perform the test

stat.test <- dat.clean %>% 
  t_test(FORMULA,
         var.equal = EQUAL.VAR,
         detailed = TRUE,
         paired = FALSE,
         alternative = "two.sided") %>%
  add_significance()
stat.test
## # A tibble: 1 × 16
##   estimate estimate1 estimate2 .y.     group1 group2    n1    n2 statistic     p
##      <dbl>     <dbl>     <dbl> <chr>   <chr>  <chr>  <int> <int>     <dbl> <dbl>
## 1    -2.30      35.0      37.3 pfos_l… PFOS   VAN+P…    12    12     -1.46 0.159
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## #   alternative <chr>, p.signif <chr>

The output provides:

  • .y.: the y variable used in the test.

  • group1,group2: the compared groups in the pairwise tests.

  • statistic: Test statistic used to compute the p-value.

  • df: degrees of freedom.

  • p: p-value.

  • p.adj: the adjusted p-value.

  • method: the statistical test used to compare groups.

  • p.signif, p.adj.signif: the significance level of p-values and adjusted p-values, respectively.

  • estimate: estimate of the effect size. It corresponds to the estimated mean or difference in means depending on whether it was a one-sample test or a two-sample test.

  • estimate1, estimate2: show the mean values of the two groups, respectively, for independent samples t-tests.

  • alternative: a character string describing the alternative hypothesis.

  • conf.low,conf.high: Lower and upper bound on a confidence interval.

Effect size
The effect size is calculated as Cohen’s D

dat.clean %>% cohens_d(FORMULA, 
                       var.equal = EQUAL.VAR,
                       paired = FALSE)
## # A tibble: 1 × 7
##   .y.            group1 group2   effsize    n1    n2 magnitude
## * <chr>          <chr>  <chr>      <dbl> <int> <int> <ord>    
## 1 pfos_liver_pct PFOS   VAN+PFOS  -0.595    12    12 moderate

5.4.3.4 Create figure

# Prepare stats
stat.test <- stat.test %>% add_xy_position(x = PREDICTOR)

# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "% of total dosed PFOS", limits = c(25,45),breaks = seq(25,45,5)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment")

p <- p + stat_pvalue_manual(stat.test, tip.length = 0, hide.ns = FALSE, y.position = c(45))
p2 <- p + labs(subtitle = get_test_label(stat.test, detailed = TRUE))
p2

# Plot for saving without legend
p3 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p3, device = "pdf", dpi = 300, units = "mm", width = 60, height = 100)

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

PFOS serum day 8 in pct. of total

5.5 Total PFOS detected on day 8

This section will prepare to perform the data analysis for total PFOS on day 8.

5.5.1 Analysis and Barplot

5.5.1.1 Prepare data

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")

# Create new dataframe with pfos data
dat.sub <- subset(dat, pfos == "yes")
tmp <- dat.sub[ , c("rat_name","ordering","treatment","tot_pfos8","pfos_serum8_mg","pfos_liver_mg")]

# Calculate ratios
for (i in tmp$rat_name) {
  tmp$total <- tmp$pfos_liver_mg + tmp$pfos_serum8_mg
  }
for (i in tmp$rat_name) {
  tmp$leftover <- tmp$tot_pfos8 - tmp$total
  }

# Calculate percentage of detected PFOS in rats
for (i in tmp$rat_name) {
  tmp$pct_det <- tmp$total / tmp$tot_pfos8 * 100
}

print("Group: PFOS")
## [1] "Group: PFOS"
tmp %>% subset(treatment == "PFOS") %>% select(pct_det) %>% summary()
##     pct_det     
##  Min.   :37.73  
##  1st Qu.:42.16  
##  Median :44.60  
##  Mean   :46.83  
##  3rd Qu.:51.56  
##  Max.   :57.59
print("Group: VAN+PFOS")
## [1] "Group: VAN+PFOS"
tmp %>% subset(treatment == "VAN+PFOS") %>% select(pct_det) %>% summary()
##     pct_det     
##  Min.   :41.29  
##  1st Qu.:45.41  
##  Median :46.69  
##  Mean   :47.88  
##  3rd Qu.:49.49  
##  Max.   :60.94
# Analysis of significance between treatment groups
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pct_det"
SUBJECT <- "rat_name"

# Subset to a specific varible
dat.clean <- tmp

# Will you run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE

# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))

# Sort data for paired test
if (PAIRED) {
  # Order data
  dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
  
  # Remove unpaired samples
  dat.clean <- dat.clean %>% 
    group_by(!!sym(SUBJECT)) %>%
    filter(n() != 1) %>%
    arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
    droplevels() %>% 
    ungroup()
}

# identify outliers
dat.clean %>%
  group_by(!!sym(PREDICTOR)) %>%
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 1 × 11
##   treatment rat_name ordering tot_pfos8 pfos_serum8_mg pfos_liver_mg total
##   <chr>     <chr>       <int>     <dbl>          <dbl>         <dbl> <dbl>
## 1 VAN+PFOS  R47            47      5.62           1.09          2.34  3.42
## # ℹ 4 more variables: leftover <dbl>, pct_det <dbl>, is.outlier <lgl>,
## #   is.extreme <lgl>
# Run Shapiro test
dat.clean %>% 
  group_by(!!sym(PREDICTOR)) %>%
  shapiro_test(!!sym(OUTCOME))
## # A tibble: 2 × 4
##   treatment variable statistic      p
##   <chr>     <chr>        <dbl>  <dbl>
## 1 PFOS      pct_det      0.945 0.570 
## 2 VAN+PFOS  pct_det      0.879 0.0853
# Create QQplot
ggqqplot(dat.clean, x = OUTCOME, facet.by = PREDICTOR)

# Run test
dat.clean %>% levene_test(FORMULA)
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     1    22     0.946 0.341
# Save output
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05

Data contains one not extreme outlier, is normally distributed, and has equal variance. Therefore we perform unpaired two-tailed t-test.

5.5.1.2 PERFORM TEST

T-test
We are now ready to perform the test

stat.test <- dat.clean %>% 
  t_test(FORMULA,
         var.equal = EQUAL.VAR,
         detailed = TRUE,
         paired = FALSE,
         alternative = "two.sided") %>%
  add_significance()
stat.test
## # A tibble: 1 × 16
##   estimate estimate1 estimate2 .y.     group1 group2    n1    n2 statistic     p
##      <dbl>     <dbl>     <dbl> <chr>   <chr>  <chr>  <int> <int>     <dbl> <dbl>
## 1    -1.05      46.8      47.9 pct_det PFOS   VAN+P…    12    12    -0.447 0.659
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## #   alternative <chr>, p.signif <chr>

Result of t-test show that there is no significant difference between the total percentage of detected PFOS (where 100% is the total dose) between the two groups.

5.5.1.3 Create barplots

# Prepare data columns for barplot
tmp2 <- rbind(data.frame("rat_name" = tmp$rat_name, "treatment" = tmp$treatment, "mg" = tmp$leftover, "type" = "Unaccounted"),
              data.frame("rat_name" = tmp$rat_name, "treatment" = tmp$treatment, "mg" = tmp$pfos_serum8_mg, "type" = "PFOS serum"),
              data.frame("rat_name" = tmp$rat_name, "treatment" = tmp$treatment, "mg" = tmp$pfos_liver_mg, "type" = "PFOS liver"))

# Create plot per rat
p <- ggplot(tmp2, aes(x = rat_name, y = mg, fill = fct_rev(type))) +
  geom_bar(position = "fill", stat = "identity") +
  theme_pubr(legend = "top") +
  facet_grid(~ treatment, scales = "free_x") +
  labs(fill = "Sample type", x = "Rats", y = "% of total dosed") +
  scale_fill_manual(values = c("Unaccounted"= "#ffffff", "PFOS liver" = "#FEBE00", "PFOS serum" = "#A91B0D")) +
  theme(axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) +
  scale_y_continuous(labels = function(x) paste0(x*100, "%"))
p

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/pfos/total_rat_barplot.png"), p, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/total_rat_barplot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 70, height = 100)

# Create plot for average per treatment group
p <- ggplot(tmp2, aes(x = treatment, y = mg, fill = fct_rev(type))) +
  geom_bar(position = "fill", stat = "identity") +
  theme_pubr(legend = "top") +
  labs(fill = "Sample type", x = "Treatment",y = "% of total dosed") +
  scale_fill_manual(values = c("Unaccounted"= "#ffffff", "PFOS liver" = "#FEBE00", "PFOS serum" = "#A91B0D")) +
  theme(axis.ticks.x=element_blank()) +
  scale_y_continuous(labels = function(x) paste0(x*100, "%"))
p

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/pfos/total_mean_barplot.png"), p, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/total_mean_barplot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 60, height = 100)

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

6 SHORT CHAIN FATTY ACID DATA

Following section is handling data analysis of short chain fatty acids from colonic samples collected at day 8. Ten SCFAs are analysed by MS Omics A/S (Denmark), delivered as concentrations in millimolar (mM), and tested accordingly here.

6.1 Formic acid / Formate

6.1.1 Prepare data

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")

# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "formic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"

# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$formic))

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable     n  mean    sd
##   <chr>     <fct>    <dbl> <dbl> <dbl>
## 1 CTRL      formic      12 0.069 0.239
## 2 PFOS      formic      12 0.244 0.484
## 3 VAN       formic      12 0.12  0.283
## 4 VAN+PFOS  formic      12 0.298 0.466

6.1.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

6.1.3 Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 6 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL      R08             8 no    no     256.  265.  268.  274.  278.   283
## 2 PFOS      R28            16 yes   no     242.  248.  252.  265.  268.   273
## 3 PFOS      R31            19 yes   no     283.  284.  293.  301.  307.   315
## 4 PFOS      R32            20 yes   no     255.  262.  269.  276.  281    291
## 5 VAN       R14            26 no    yes    246.  256.  260.  267.  270.   274
## 6 VAN       R17            29 no    yes    268.  278.  274.  290.  295    295
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic      p.value
##   <chr>                <dbl>        <dbl>
## 1 residuals(model)     0.732 0.0000000500

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     3    44     0.922 0.438
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

Formic acid data contains vary little data above Limit of Detection (= 0.6) with a total of 10 valid data points. There are no outliers, Shapiro-Wilk test shows no normality and Levene test shows equal variance. We use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment. ### Kruskal-Wallis test

6.1.3.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov
## # A tibble: 1 × 6
##   .y.        n statistic    df     p method        
## * <chr>  <int>     <dbl> <int> <dbl> <chr>         
## 1 formic    48      2.44     3 0.486 Kruskal-Wallis
6.1.3.0.2 Effect size

The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).

dat.clean %>% kruskal_effsize(FORMULA)
## # A tibble: 1 × 5
##   .y.        n effsize method  magnitude
## * <chr>  <int>   <dbl> <chr>   <ord>    
## 1 formic    48 -0.0127 eta2[H] small
6.1.3.0.3 Post-hoc test if interaction is significant

A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.

# pairwise comparisons
pwc <- dat.clean %>% 
  dunn_test(FORMULA, p.adjust.method = "fdr") 
pwc
## # A tibble: 6 × 9
##   .y.    group1 group2      n1    n2 statistic     p p.adj p.adj.signif
## * <chr>  <chr>  <chr>    <int> <int>     <dbl> <dbl> <dbl> <chr>       
## 1 formic CTRL   PFOS        12    12     0.986 0.324 0.648 ns          
## 2 formic CTRL   VAN         12    12     0.400 0.689 0.689 ns          
## 3 formic CTRL   VAN+PFOS    12    12     1.45  0.148 0.648 ns          
## 4 formic PFOS   VAN         12    12    -0.585 0.558 0.689 ns          
## 5 formic PFOS   VAN+PFOS    12    12     0.462 0.644 0.689 ns          
## 6 formic VAN    VAN+PFOS    12    12     1.05  0.295 0.648 ns

6.1.4 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "mM formate",limits = c(0,1.6),breaks = seq(0,1.6,0.5)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment") +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) +
  geom_hline(yintercept = 0.6, linetype = "dashed", color = "#2f2f2f")
  
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE)
p
## Warning: Removed 22 rows containing missing values (`geom_point()`).

p.formic <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.formic,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 22 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 22 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

SCFA in mM

6.2 Acetic acid / Acetate

6.2.1 Prepare data

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")

# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "acetic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"

# Remove NA in the data column
dat.clean <- subset(dat, !is.na(OUTCOME) & !dat$rat_name == "R08")

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable     n  mean    sd
##   <chr>     <fct>    <dbl> <dbl> <dbl>
## 1 CTRL      acetic      11  7.84 1.23 
## 2 PFOS      acetic      12  7.19 2.49 
## 3 VAN       acetic      12  3.02 0.674
## 4 VAN+PFOS  acetic      12  3.29 1.04

6.2.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

6.2.3 Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 1 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL      R10            10 no    no     266.  273.  275.  285.  291.   294
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Data contains two outliers, where one is extreme (R08). This outlier has been removed from the analysis.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic p.value
##   <chr>                <dbl>   <dbl>
## 1 residuals(model)     0.966   0.189

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic        p
##   <int> <int>     <dbl>    <dbl>
## 1     3    43      8.35 0.000174
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

Most data for this set is above Limit of Detection (= 0.5). This shows that SCFA concentration has two outliers, where one is extreme and has been removed from analysis. Shapiro-Wilk test show normality but the data has unequal variance. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.

6.2.4 Kruskal-Wallis test

6.2.4.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov
## # A tibble: 1 × 6
##   .y.        n statistic    df           p method        
## * <chr>  <int>     <dbl> <int>       <dbl> <chr>         
## 1 acetic    47      31.4     3 0.000000692 Kruskal-Wallis
6.2.4.0.2 Effect size

The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).

dat.clean %>% kruskal_effsize(FORMULA)
## # A tibble: 1 × 5
##   .y.        n effsize method  magnitude
## * <chr>  <int>   <dbl> <chr>   <ord>    
## 1 acetic    47   0.661 eta2[H] large
6.2.4.0.3 Post-hoc test if interaction is significant

A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.

# pairwise comparisons
pwc <- dat.clean %>% 
  dunn_test(FORMULA, p.adjust.method = "fdr") 
pwc
## # A tibble: 6 × 9
##   .y.    group1 group2      n1    n2 statistic         p     p.adj p.adj.signif
## * <chr>  <chr>  <chr>    <int> <int>     <dbl>     <dbl>     <dbl> <chr>       
## 1 acetic CTRL   PFOS        11    12    -0.478 0.633     0.743     ns          
## 2 acetic CTRL   VAN         11    12    -4.31  0.0000165 0.0000992 ****        
## 3 acetic CTRL   VAN+PFOS    11    12    -3.99  0.0000670 0.000181  ***         
## 4 acetic PFOS   VAN         12    12    -3.92  0.0000903 0.000181  ***         
## 5 acetic PFOS   VAN+PFOS    12    12    -3.59  0.000333  0.000500  ***         
## 6 acetic VAN    VAN+PFOS    12    12     0.328 0.743     0.743     ns

6.2.4.1 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "mM acetate",limits = c(0,15),breaks = seq(0,15,2)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment")  +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) +
  geom_hline(yintercept = 0.5, linetype = "dashed", color = "#2f2f2f")

p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = FALSE, y.position = c(14,15,12,13))
p

p.acetic <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.acetic,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

PFOS amount in mg at day 4 and 8

6.3 Propanoic acid / Propionate

6.3.1 Prepare data

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")

# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "propanoic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"

# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$propanoic))

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable      n  mean    sd
##   <chr>     <fct>     <dbl> <dbl> <dbl>
## 1 CTRL      propanoic    12 0.268 0.113
## 2 PFOS      propanoic    12 0.265 0.095
## 3 VAN       propanoic    12 0.264 0.086
## 4 VAN+PFOS  propanoic    12 0.325 0.102

6.3.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

#### Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 6 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL      R08             8 no    no     256.  265.  268.  274.  278.   283
## 2 PFOS      R28            16 yes   no     242.  248.  252.  265.  268.   273
## 3 VAN       R15            27 no    yes    268.  277.  283.  290.  296.   300
## 4 VAN       R18            30 no    yes    266.  275.  282.  285.  288.   298
## 5 VAN       R22            34 no    yes    292.  296.  301.  313.  311.   321
## 6 VAN+PFOS  R43            43 yes   yes    292.  301.  300.  313.  316.   322
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Data contains five outliers, where none are extreme. As removing outliers does not affect final outcome they are left in the analysis.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic p.value
##   <chr>                <dbl>   <dbl>
## 1 residuals(model)     0.980   0.586

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     3    44     0.280 0.840
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that propanoic acid concentration has 5 outliers, Shapiro-Wilk test shows normality and Levene test shows equal variance. Therefore we will use one-way ANOVA test with Tukey’s honest significance test. ### ANOVA One-Way test

6.3.2.1 Perform test

If we had equality of variance we can now run a one-way ANOVA tests anova_test() (if we have equal variance) or a welch_anova_test() (if variance vary).

if(EQUAL.VAR) {
  res.aov <- dat.clean %>% anova_test(FORMULA)
  res.aov
} else {
  res.aov <- dat.clean %>% welch_anova_test(FORMULA)
  res.aov
}
## ANOVA Table (type II tests)
## 
##      Effect DFn DFd     F     p p<.05   ges
## 1 treatment   3  44 1.068 0.372       0.068

6.3.2.2 Perform posthoc test

A significant one-way ANOVA is generally followed up by Tukey post-hoc tests to perform multiple pairwise comparisons between groups. When running relaxed Welch one-way test, the Games-Howell post hoc test or pairwise t-tests (with no assumption of equal variances) can be used to compare all possible combinations of group differences.

if(EQUAL.VAR) {
  pwc <- dat.clean %>% tukey_hsd(FORMULA)
  pwc
} else {
  pwc <- dat.clean %>% games_howell_test(FORMULA)
  pwc
}
## # A tibble: 6 × 9
##   term   group1 group2 null.value estimate conf.low conf.high p.adj p.adj.signif
## * <chr>  <chr>  <chr>       <dbl>    <dbl>    <dbl>     <dbl> <dbl> <chr>       
## 1 treat… CTRL   PFOS            0 -0.00295  -0.111      0.105 1     ns          
## 2 treat… CTRL   VAN             0 -0.00466  -0.113      0.104 0.999 ns          
## 3 treat… CTRL   VAN+P…          0  0.0566   -0.0516     0.165 0.508 ns          
## 4 treat… PFOS   VAN             0 -0.00171  -0.110      0.107 1     ns          
## 5 treat… PFOS   VAN+P…          0  0.0596   -0.0487     0.168 0.464 ns          
## 6 treat… VAN    VAN+P…          0  0.0613   -0.0470     0.170 0.44  ns

6.3.3 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "mM propionate",limits = c(0,0.6),breaks = seq(0,0.6,0.2)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment") +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) +
  geom_hline(yintercept = 0.03, linetype = "dashed", color = "#2f2f2f")
  
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE)
p
## Warning: Removed 1 rows containing missing values (`geom_point()`).

p.propanoic <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.propanoic,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 1 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 1 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

SCFA in mM

6.4 2-methyl-Propanoic acid / Isobutyrate

6.4.1 Prepare data

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")

# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "m2_propanoic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"

# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$m2_propanoic)) # & !dat$rat_name == "R01")

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable         n  mean    sd
##   <chr>     <fct>        <dbl> <dbl> <dbl>
## 1 CTRL      m2_propanoic    12 0.049 0.025
## 2 PFOS      m2_propanoic    12 0.05  0.017
## 3 VAN       m2_propanoic    12 0.005 0.018
## 4 VAN+PFOS  m2_propanoic    12 0.01  0.027

6.4.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

### Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 5 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL      R01             1 no    no     310   321.  325.  339.  350    354
## 2 CTRL      R08             8 no    no     256.  265.  268.  274.  278.   283
## 3 VAN       R19            31 no    yes    256.  263.  269   279.  282.   287
## 4 VAN+PFOS  R42            42 yes   yes    240.  244.  251.  260   264.   272
## 5 VAN+PFOS  R47            47 yes   yes    242.  249.  255.  263.  267.   271
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Data contains one outliers, where one is extreme (R01). This outlier has been removed from the analysis -> leading to new outlier but not extreme.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic     p.value
##   <chr>                <dbl>       <dbl>
## 1 residuals(model)     0.755 0.000000145

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     3    44     0.633 0.598
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that 2-methyl-propanoic acid concentration has five outliers of which four are extreme outlier - these arise from several samples = 0, and based on this they will be left in. Shapiro-Wilk test show normality but the data has unequal variance. Furthermore, very few samples above Limit of Detection (= 0.02) are observed in vancomycin treated samples. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.

6.4.3 Kruskal-Wallis test

6.4.3.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov
## # A tibble: 1 × 6
##   .y.              n statistic    df          p method        
## * <chr>        <int>     <dbl> <int>      <dbl> <chr>         
## 1 m2_propanoic    48      26.2     3 0.00000882 Kruskal-Wallis
6.4.3.0.2 Effect size

The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).

dat.clean %>% kruskal_effsize(FORMULA)
## # A tibble: 1 × 5
##   .y.              n effsize method  magnitude
## * <chr>        <int>   <dbl> <chr>   <ord>    
## 1 m2_propanoic    48   0.526 eta2[H] large
6.4.3.0.3 Post-hoc test if interaction is significant

A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.

# pairwise comparisons
pwc <- dat.clean %>% 
  dunn_test(FORMULA, p.adjust.method = "fdr") 
pwc
## # A tibble: 6 × 9
##   .y.          group1 group2     n1    n2 statistic       p   p.adj p.adj.signif
## * <chr>        <chr>  <chr>   <int> <int>     <dbl>   <dbl>   <dbl> <chr>       
## 1 m2_propanoic CTRL   PFOS       12    12    0.0383 9.69e-1 9.69e-1 ns          
## 2 m2_propanoic CTRL   VAN        12    12   -3.73   1.94e-4 5.82e-4 ***         
## 3 m2_propanoic CTRL   VAN+PF…    12    12   -3.46   5.44e-4 8.15e-4 ***         
## 4 m2_propanoic PFOS   VAN        12    12   -3.76   1.67e-4 5.82e-4 ***         
## 5 m2_propanoic PFOS   VAN+PF…    12    12   -3.50   4.71e-4 8.15e-4 ***         
## 6 m2_propanoic VAN    VAN+PF…    12    12    0.268  7.88e-1 9.46e-1 ns

6.4.4 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "mM isobutyrate",limits = c(0,0.11),breaks = seq(0,0.11,0.02)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment") +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) +
  geom_hline(yintercept = 0.02, linetype = "dashed", color = "#2f2f2f")
  
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(0.102,0.11,0.096,0.088))
p
## Warning: Removed 13 rows containing missing values (`geom_point()`).

p.m2p <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.m2p,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 13 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 13 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

SCFA in mM

6.5 Butanoic acid / Butyrate

6.5.1 Prepare data

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")

# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "butanoic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"

# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$butanoic))

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable     n  mean    sd
##   <chr>     <fct>    <dbl> <dbl> <dbl>
## 1 CTRL      butanoic    12 1.28  0.655
## 2 PFOS      butanoic    12 1.14  0.625
## 3 VAN       butanoic    12 0.187 0.195
## 4 VAN+PFOS  butanoic    12 0.193 0.109

6.5.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

### Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 2 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 VAN       R20            32 no    yes    285   293.  301.  310.  317.   321
## 2 VAN+PFOS  R48            48 yes   yes    224.  229.  234.  239.  242.   250
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Data contains two outliers, where one is extreme (R20). This outlier does not affect the final results or type of analysis and has been left in.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic p.value
##   <chr>                <dbl>   <dbl>
## 1 residuals(model)     0.938  0.0130

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic        p
##   <int> <int>     <dbl>    <dbl>
## 1     3    44      7.69 0.000309
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

Most data for this set is above Limit of Detection (= 0.03). This shows that butanioc acid concentration has two outliers, where one is extreme and has been removed from analysis. Shapiro-Wilk test shows no normality and Levene test shows unequal variance. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.

6.5.3 Kruskal-Wallis test

6.5.3.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov
## # A tibble: 1 × 6
##   .y.          n statistic    df         p method        
## * <chr>    <int>     <dbl> <int>     <dbl> <chr>         
## 1 butanoic    48      27.4     3 0.0000048 Kruskal-Wallis
6.5.3.0.2 Effect size

The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).

dat.clean %>% kruskal_effsize(FORMULA)
## # A tibble: 1 × 5
##   .y.          n effsize method  magnitude
## * <chr>    <int>   <dbl> <chr>   <ord>    
## 1 butanoic    48   0.555 eta2[H] large
6.5.3.0.3 Post-hoc test if interaction is significant

A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.

# pairwise comparisons
pwc <- dat.clean %>% 
  dunn_test(FORMULA, p.adjust.method = "fdr") 
pwc
## # A tibble: 6 × 9
##   .y.      group1 group2      n1    n2 statistic         p    p.adj p.adj.signif
## * <chr>    <chr>  <chr>    <int> <int>     <dbl>     <dbl>    <dbl> <chr>       
## 1 butanoic CTRL   PFOS        12    12   -0.0729 0.942     0.942    ns          
## 2 butanoic CTRL   VAN         12    12   -3.95   0.0000777 0.000315 ***         
## 3 butanoic CTRL   VAN+PFOS    12    12   -3.50   0.000467  0.000918 ***         
## 4 butanoic PFOS   VAN         12    12   -3.88   0.000105  0.000315 ***         
## 5 butanoic PFOS   VAN+PFOS    12    12   -3.43   0.000612  0.000918 ***         
## 6 butanoic VAN    VAN+PFOS    12    12    0.452  0.651     0.782    ns

6.5.4 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "mM butyrate",limits = c(0,3),breaks = seq(0,3,0.5)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment") +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) +
  geom_hline(yintercept = 0.03, linetype = "dashed", color = "#2f2f2f")
  
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(2.6,3,2.4,2.8))
p
## Warning: Removed 1 rows containing missing values (`geom_point()`).

p.butanoic <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.butanoic,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 1 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 1 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

SCFA in mM

6.6 3-methyl-Butanoic acid / Isovalerate

6.6.1 Prepare data

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")

# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "m3_butanoic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"

# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$m3_butanoic))

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable        n  mean    sd
##   <chr>     <fct>       <dbl> <dbl> <dbl>
## 1 CTRL      m3_butanoic    12 0.03  0.017
## 2 PFOS      m3_butanoic    12 0.033 0.012
## 3 VAN       m3_butanoic    12 0.005 0.018
## 4 VAN+PFOS  m3_butanoic    12 0.006 0.022

6.6.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

### Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 3 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL      R01             1 no    no     310   321.  325.  339.  350    354
## 2 VAN       R19            31 no    yes    256.  263.  269   279.  282.   287
## 3 VAN+PFOS  R42            42 yes   yes    240.  244.  251.  260   264.   272
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Data contains three outliers arising from several data points being below Limit of detection. Furthermore removing extreme outliers does not affect the result or analysis - these have therefore been left in.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic       p.value
##   <chr>                <dbl>         <dbl>
## 1 residuals(model)     0.678 0.00000000551

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     3    44     0.391 0.760
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that 3-methyl-butanoic acid concentration has three outliers, which has been left in. Shapiro-Wilk test show no normality but the data has equal variance. Furthermore, very few samples above Limit of Detection (= 0.02) are observed in vancomycin treated samples. We use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.

6.6.3 Kruskal-Wallis test

6.6.3.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov
## # A tibble: 1 × 6
##   .y.             n statistic    df         p method        
## * <chr>       <int>     <dbl> <int>     <dbl> <chr>         
## 1 m3_butanoic    48      25.6     3 0.0000117 Kruskal-Wallis
6.6.3.0.2 Effect size

The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).

dat.clean %>% kruskal_effsize(FORMULA)
## # A tibble: 1 × 5
##   .y.             n effsize method  magnitude
## * <chr>       <int>   <dbl> <chr>   <ord>    
## 1 m3_butanoic    48   0.513 eta2[H] large
6.6.3.0.3 Post-hoc test if interaction is significant

A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.

# pairwise comparisons
pwc <- dat.clean %>% 
  dunn_test(FORMULA, p.adjust.method = "fdr") 
pwc
## # A tibble: 6 × 9
##   .y.         group1 group2      n1    n2 statistic       p   p.adj p.adj.signif
## * <chr>       <chr>  <chr>    <int> <int>     <dbl>   <dbl>   <dbl> <chr>       
## 1 m3_butanoic CTRL   PFOS        12    12    0.556  5.78e-1 6.94e-1 ns          
## 2 m3_butanoic CTRL   VAN         12    12   -3.29   9.96e-4 1.67e-3 **          
## 3 m3_butanoic CTRL   VAN+PFOS    12    12   -3.26   1.11e-3 1.67e-3 **          
## 4 m3_butanoic PFOS   VAN         12    12   -3.85   1.19e-4 4.05e-4 ***         
## 5 m3_butanoic PFOS   VAN+PFOS    12    12   -3.82   1.35e-4 4.05e-4 ***         
## 6 m3_butanoic VAN    VAN+PFOS    12    12    0.0309 9.75e-1 9.75e-1 ns

6.6.4 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "mM isovalerate",limits = c(0,0.1),breaks = seq(0,0.1,0.02)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment") +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) +
  geom_hline(yintercept = 0.02, linetype = "dashed", color = "#2f2f2f")
  
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(0.092,0.1,0.084,0.076))
p
## Warning: Removed 14 rows containing missing values (`geom_point()`).

p.m3b <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.m3b,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 14 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 14 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

SCFA in mM

6.7 Pentanoic acid / Valerate

6.7.1 Prepare data

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")

# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pentanoic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"

# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$pentanoic))

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable      n  mean    sd
##   <chr>     <fct>     <dbl> <dbl> <dbl>
## 1 CTRL      pentanoic    12 0.047 0.023
## 2 PFOS      pentanoic    12 0.048 0.025
## 3 VAN       pentanoic    12 0.004 0.015
## 4 VAN+PFOS  pentanoic    12 0.007 0.026

6.7.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

### Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 3 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 PFOS      R26            14 yes   no     238.  246.  248   257   259.   265
## 2 VAN       R19            31 no    yes    256.  263.  269   279.  282.   287
## 3 VAN+PFOS  R42            42 yes   yes    240.  244.  251.  260   264.   272
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Data contains three outliers arising from several data points being below Limit of detection. Furthermore removing extreme outliers does not affect the result or analysis - these have therefore been left in.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic    p.value
##   <chr>                <dbl>      <dbl>
## 1 residuals(model)     0.820 0.00000372

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     3    44      1.72 0.177
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that pentanoic acid concentration has one outlier, which has been left in for the analysis. Shapiro-Wilk test shows no normality and Levene test shows equal variance. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.

6.7.3 Kruskal-Wallis test

6.7.3.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov
## # A tibble: 1 × 6
##   .y.           n statistic    df          p method        
## * <chr>     <int>     <dbl> <int>      <dbl> <chr>         
## 1 pentanoic    48      27.3     3 0.00000509 Kruskal-Wallis
6.7.3.0.2 Effect size

The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).

dat.clean %>% kruskal_effsize(FORMULA)
## # A tibble: 1 × 5
##   .y.           n effsize method  magnitude
## * <chr>     <int>   <dbl> <chr>   <ord>    
## 1 pentanoic    48   0.552 eta2[H] large
6.7.3.0.3 Post-hoc test if interaction is significant

A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.

# pairwise comparisons
pwc <- dat.clean %>% 
  dunn_test(FORMULA, p.adjust.method = "fdr") 
pwc
## # A tibble: 6 × 9
##   .y.       group1 group2      n1    n2 statistic        p    p.adj p.adj.signif
## * <chr>     <chr>  <chr>    <int> <int>     <dbl>    <dbl>    <dbl> <chr>       
## 1 pentanoic CTRL   PFOS        12    12    0.0155 0.988    0.988    ns          
## 2 pentanoic CTRL   VAN         12    12   -3.76   0.000173 0.000448 ***         
## 3 pentanoic CTRL   VAN+PFOS    12    12   -3.62   0.000299 0.000448 ***         
## 4 pentanoic PFOS   VAN         12    12   -3.77   0.000163 0.000448 ***         
## 5 pentanoic PFOS   VAN+PFOS    12    12   -3.63   0.000282 0.000448 ***         
## 6 pentanoic VAN    VAN+PFOS    12    12    0.139  0.889    0.988    ns

6.7.4 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "mM valerate",limits = c(0,0.11),breaks = seq(0,0.11,0.02)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment") +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) +
  geom_hline(yintercept = 0.01, linetype = "dashed", color = "#2f2f2f")
  
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(0.086,0.094,0.11,0.102))
p
## Warning: Removed 14 rows containing missing values (`geom_point()`).

p.pentanoic <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.pentanoic,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 14 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 14 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

SCFA in mM

6.8 4-methyl-Pentanoic acid / Isocaproate

6.8.1 Prepare data

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")

# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "m4_pentanoic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"

# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$m4_pentanoic))# & !dat$rat_name %in% c("R42","R45"))

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable         n  mean    sd
##   <chr>     <fct>        <dbl> <dbl> <dbl>
## 1 CTRL      m4_pentanoic    12 0.018 0.023
## 2 PFOS      m4_pentanoic    12 0.03  0.058
## 3 VAN       m4_pentanoic    12 0.035 0.074
## 4 VAN+PFOS  m4_pentanoic    12 0.216 0.513

6.8.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

#### Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 5 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 PFOS      R25            13 yes   no     339.  340.  353.  364.  348.   358
## 2 PFOS      R28            16 yes   no     242.  248.  252.  265.  268.   273
## 3 VAN       R19            31 no    yes    256.  263.  269   279.  282.   287
## 4 VAN+PFOS  R42            42 yes   yes    240.  244.  251.  260   264.   272
## 5 VAN+PFOS  R45            45 yes   yes    234.  239.  244.  253.  262.   263
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Data contains five outliers, however, these outliers arise several data points being under Limit of Detection (=0.03). These data points are therefore kept.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic  p.value
##   <chr>                <dbl>    <dbl>
## 1 residuals(model)     0.461 4.94e-12

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     3    44      1.62 0.198
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that 4-methyl-pentanoic acid concentration has five outliers and only few data points are above Limit of Detection (= 0.03) - outliers are therefore kept. Shapiro-Wilk test show no normality but the data has equal variance. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.

6.8.3 Kruskal-Wallis test

6.8.3.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov
## # A tibble: 1 × 6
##   .y.              n statistic    df     p method        
## * <chr>        <int>     <dbl> <int> <dbl> <chr>         
## 1 m4_pentanoic    48      1.32     3 0.724 Kruskal-Wallis
6.8.3.0.2 Effect size

The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).

dat.clean %>% kruskal_effsize(FORMULA)
## # A tibble: 1 × 5
##   .y.              n effsize method  magnitude
## * <chr>        <int>   <dbl> <chr>   <ord>    
## 1 m4_pentanoic    48 -0.0381 eta2[H] small
6.8.3.0.3 Post-hoc test if interaction is significant

A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.

# pairwise comparisons
pwc <- dat.clean %>% 
  dunn_test(FORMULA, p.adjust.method = "fdr") 
pwc
## # A tibble: 6 × 9
##   .y.          group1 group2      n1    n2 statistic     p p.adj p.adj.signif
## * <chr>        <chr>  <chr>    <int> <int>     <dbl> <dbl> <dbl> <chr>       
## 1 m4_pentanoic CTRL   PFOS        12    12   -0.165  0.869 0.987 ns          
## 2 m4_pentanoic CTRL   VAN         12    12    0.0165 0.987 0.987 ns          
## 3 m4_pentanoic CTRL   VAN+PFOS    12    12    0.875  0.381 0.781 ns          
## 4 m4_pentanoic PFOS   VAN         12    12    0.182  0.856 0.987 ns          
## 5 m4_pentanoic PFOS   VAN+PFOS    12    12    1.04   0.298 0.781 ns          
## 6 m4_pentanoic VAN    VAN+PFOS    12    12    0.859  0.391 0.781 ns

6.8.4 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "mM isocaproate",limits = c(0,1.78),breaks = seq(0,1.78,0.2)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment") +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) +
  geom_hline(yintercept = 0.03, linetype = "dashed", color = "#2f2f2f")
  
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE)
p
## Warning: Removed 14 rows containing missing values (`geom_point()`).

p.m4p <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.m4p,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 14 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 14 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

SCFA in mM

6.9 Hexanoic acid / Caproate

6.9.1 Prepare data

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")

# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "hexanoic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"

# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$hexanoic))

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable     n  mean    sd
##   <chr>     <fct>    <dbl> <dbl> <dbl>
## 1 CTRL      hexanoic    12 0.053 0.041
## 2 PFOS      hexanoic    12 0.07  0.058
## 3 VAN       hexanoic    12 0.004 0.014
## 4 VAN+PFOS  hexanoic    12 0.006 0.021

6.9.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

### Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 2 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 VAN       R19            31 no    yes    256.  263.  269   279.  282.   287
## 2 VAN+PFOS  R42            42 yes   yes    240.  244.  251.  260   264.   272
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Data contains two outliers, both arising due to majority of data points in the vancomycin treated groups are below Limit of Detection (=0.01). Therefore these outliers are left in.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic   p.value
##   <chr>                <dbl>     <dbl>
## 1 residuals(model)     0.874 0.0000992

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic        p
##   <int> <int>     <dbl>    <dbl>
## 1     3    44      6.86 0.000687
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that hexanoic acid concentration has two outliers and several datapoint below Limit of Detection. Shapiro-Wilk test shows no normality and Levene test shows unequal variance. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.

6.9.3 Kruskal-Wallis test

6.9.3.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov
## # A tibble: 1 × 6
##   .y.          n statistic    df          p method        
## * <chr>    <int>     <dbl> <int>      <dbl> <chr>         
## 1 hexanoic    48      28.0     3 0.00000356 Kruskal-Wallis
6.9.3.0.2 Effect size

The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).

dat.clean %>% kruskal_effsize(FORMULA)
## # A tibble: 1 × 5
##   .y.          n effsize method  magnitude
## * <chr>    <int>   <dbl> <chr>   <ord>    
## 1 hexanoic    48   0.569 eta2[H] large
6.9.3.0.3 Post-hoc test if interaction is significant

A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.

# pairwise comparisons
pwc <- dat.clean %>% 
  dunn_test(FORMULA, p.adjust.method = "fdr") 
pwc
## # A tibble: 6 × 9
##   .y.      group1 group2      n1    n2 statistic         p    p.adj p.adj.signif
## * <chr>    <chr>  <chr>    <int> <int>     <dbl>     <dbl>    <dbl> <chr>       
## 1 hexanoic CTRL   PFOS        12    12    0.717  0.473     0.568    ns          
## 2 hexanoic CTRL   VAN         12    12   -3.39   0.000699  0.00139  **          
## 3 hexanoic CTRL   VAN+PFOS    12    12   -3.31   0.000927  0.00139  **          
## 4 hexanoic PFOS   VAN         12    12   -4.11   0.0000401 0.000168 ***         
## 5 hexanoic PFOS   VAN+PFOS    12    12   -4.03   0.0000560 0.000168 ***         
## 6 hexanoic VAN    VAN+PFOS    12    12    0.0779 0.938     0.938    ns

6.9.4 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "mM caproate",limits = c(0,0.22),breaks = seq(0,0.22,0.05)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment") +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) +
  geom_hline(yintercept = 0.01, linetype = "dashed", color = "#2f2f2f")
  
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(0.205,0.22,0.175,0.19))
p
## Warning: Removed 15 rows containing missing values (`geom_point()`).

p.hexanoic <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.hexanoic,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 15 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 15 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

SCFA in mM

6.10 Heptanoic acid / Enanthate

6.10.1 Prepare data

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")

# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "heptanoic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"

# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$heptanoic))

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable      n  mean    sd
##   <chr>     <fct>     <dbl> <dbl> <dbl>
## 1 CTRL      heptanoic    12 0.004 0.012
## 2 PFOS      heptanoic    12 0     0    
## 3 VAN       heptanoic    12 0.009 0.033
## 4 VAN+PFOS  heptanoic    12 0.008 0.028

6.10.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

### Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 3 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL      R10            10 no    no     266.  273.  275.  285.  291.   294
## 2 VAN       R19            31 no    yes    256.  263.  269   279.  282.   287
## 3 VAN+PFOS  R42            42 yes   yes    240.  244.  251.  260   264.   272
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic  p.value
##   <chr>                <dbl>    <dbl>
## 1 residuals(model)     0.399 9.66e-13

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     3    44     0.446 0.722
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that heptanoic acid concentration only three data points above Limit of Detection (= 0.03) which is deemed too low for analysis. No final analysis therefore made.

6.10.3 Create figure

#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "mM enanthate",limits = c(0,0.22),breaks = seq(0,0.22,0.05)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment") +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) +
  geom_hline(yintercept = 0.01, linetype = "dashed", color = "#2f2f2f")
  
#p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(0.205,0.22,0.175,0.19))
p
## Warning: Removed 25 rows containing missing values (`geom_point()`).

p.heptanoic <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.heptanoic,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 25 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 25 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

SCFA in mM

6.11 SCFA ggarrange

Here all plots from the SCFA analysis is combined into one figure.

params <- readRDS("R_objects/animal_params.RDS")

# Load rdata files with scfa plots
pfiles <- list.files(path = "R_objects/scfa/", pattern = "*.rdata", full.names = TRUE)
lapply(pfiles, load,.GlobalEnv)
## [[1]]
## [1] "p.acetic"
## 
## [[2]]
## [1] "p.butanoic"
## 
## [[3]]
## [1] "p.formic"
## 
## [[4]]
## [1] "p.heptanoic"
## 
## [[5]]
## [1] "p.hexanoic"
## 
## [[6]]
## [1] "p.m2p"
## 
## [[7]]
## [1] "p.m3b"
## 
## [[8]]
## [1] "p.m4p"
## 
## [[9]]
## [1] "p.pentanoic"
## 
## [[10]]
## [1] "p.propanoic"
# Create plot
p.all <- ggarrange(p.formic,p.acetic,p.propanoic,p.m2p,p.butanoic,p.m3b,p.pentanoic,p.m4p,p.hexanoic,p.heptanoic,
                   ncol = 5, nrow = 2, 
                   common.legend = TRUE,
                   legend = "top",
                   label.x = 0,
                   font.label = list(size = 24, face = "bold"),
                   labels = c("A","B","C","D","E","F","G","H","I","J"),
                   align = "hv")
## Warning: Removed 22 rows containing missing values (`geom_point()`).
## Removed 22 rows containing missing values (`geom_point()`).
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 13 rows containing missing values (`geom_point()`).
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 14 rows containing missing values (`geom_point()`).
## Removed 14 rows containing missing values (`geom_point()`).
## Removed 14 rows containing missing values (`geom_point()`).
## Warning: Removed 15 rows containing missing values (`geom_point()`).
## Warning: Removed 25 rows containing missing values (`geom_point()`).
p.all

# Save graphics
ggsave(filename = "plots/animal_data/scfa/all.png", p.all, device = "png", dpi = 300, height = 200, width = 400, units = "mm")
ggsave(filename = "plots/animal_data/scfa/all.pdf", p.all, device = "pdf", dpi = 300, height = 200, width = 400, units = "mm")

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

7 SETTINGS

Overview of the parameters and packages that were used for this analysis

7.1 PARAMETERS

The following paramenters were set in for this analysis:

params <- readRDS("R_objects/import_params.RDS")

tmp <- unlist(params)
dat <- data.frame(Parameter = names(tmp), Value = unname(tmp))


kbl(dat, row.names = F) %>% kable_classic(lightable_options = "striped")

7.2 PACKAGES

The analysis was run in R version 4.2.2 using the following packages:

pack <- data.frame(Package = (.packages()))

for (i in seq(nrow(pack))) pack$Version[i] <- as.character(packageVersion(pack$Package[i]))

kbl(pack[order(pack$Package),], row.names = F) %>% kable_classic(lightable_options = "striped")   
Package Version
ape 5.7
base 4.2.2
datasets 4.2.2
decontam 1.18.0
dplyr 1.1.0
forcats 1.0.0
ggbreak 0.1.1
ggplot2 3.4.2
ggpubr 0.6.0
graphics 4.2.2
grDevices 4.2.2
kableExtra 1.3.4
lattice 0.20.45
lubridate 1.9.2
methods 4.2.2
pals 1.7
permute 0.9.7
phangorn 2.11.1
phyloseq 1.42.0
plotly 4.10.1
purrr 1.0.1
readr 2.1.4
rstatix 0.7.2
stats 4.2.2
stringr 1.5.0
tibble 3.1.8
tidyr 1.3.0
tidyverse 2.0.0
utils 4.2.2
vegan 2.6.4